Blog

October 24, 2013

sbt 0.13 - Interaction Meets Automation

@jsuereth
October 24, 2013
sbt

sbt 0.13, released on August 26th, marks a big milestone for scala build tools.  Along with a nice set of fixes, some faster incremental compilation and a bit of setting sequencing options, sbt 0.13 brings with it a brand new API for defining settings.

As of sbt 0.13, you can now create tasks via the new API:

val sayHello = taskKey[Unit]("Says hello on the console")
sayHello := println(s"Hello, ${name.value}")

Previously, when defining a new task in sbt, you'd need to do the following:

val sayHello = TaskKey[Unit]("sayHello", "Says hello on the console")
sayHello <<= name map { n =>  println("Hello, " + n) }

This API marks the next step in sbt's goal of providing the best possible tooling experience for Scala developers.  sbt aims to be the perfect blend of interaction and automation for building software.  The journey of sbt has been marked by a few key decisions which we'll explore now:

  1. Use Scala as a build language.
  2. Stay reactive and parallel
  3. Interaction is as important as automation.

Decision 1 - Use Scala as the build language.

That was really the heart and soul of sbt:  Scala.   We love the language and wanted to see what a build tool written *in* and *for* scala would look like.  Not only that, but the build language itself should showcase the core features that make Scala compelling, in particular:

  • Type-safety
  • Functional Programming
  • Flexible syntax for task specific APIs/embedded domain specific languages

To take advantage of Scala's type inference and type checking, sbt has gone to lengths to ensure that preserving typed information throughout the build is as simple as possible.  In sbt, all tasks, all settings and everything built from these have types and these types can be used to eliminate a certain class of errors in programs, just like any Scala program.   Currently the plugin ecosystem has benefited from this level of type information, as plugins start to depend on each other and share information directly in code, rather than through the filesystem.  The level of information available in types hasn't yet been fully tapped via tooling, but as the ecosystem grows, we plan to leverage this information more and more.

There were also a few consequences to choosing Scala as a build language:

Consequences for choosing Scala

This first consequence is that Scala's compile times are slower than what most people expect from a build tool.   Scala compilation speed is closer to C++, which gives the feel of a "slow" tool.  This led to two developments inside of sbt:

  1. Large investment in caching and cache management
  2. Developing an advanced incremental compilation algorithm for Scala.

The caching and cache management of sbt is done to ensure that the build specification is only compiled as often as is absolutely necessary.   In fact, the build.sbt format is designed such that the file can be fragmented and partially recompiled as needed, to ensure the fastest possible startup time.  If you ever wondered why build.sbt requires a blank line between settings, this is the reason.  The faster we can chunk the file and check cached values, the faster we can get you into a working build.

The incremental compiler is another interesting development.   When sbt was first being developed, incremental compilation in Scala was pretty naive.  In fact, the incremental compiler is the main driver behind the existence of sbt in the first place.  To improve the speed of Scala builds, it was necessary to try to limit the amount of work Scala had to do upon changing any given file.  This lead to a mechanism whereby sbt could track the work Scala did to change files, and see if these changes required compiling other files.   This feature has grown a bit over the years, and is now exposed as its own library for all scala tooling, called zinc.  sbt's incremental compilation is now the defacto standard for all development tools working with Scala, from Ant and Gradle to IntelliJ and Eclipse.

Another consequence of having slower compilation times leads directly into the next decision in sbt:

Decision 2 - Stay reactive and parallel

If we have three tasks:

  1. Generate source files from protocol buffers.
  2. Generate source files from build properties.
  3. Compile all source files.

Then tasks 1 and 2 should be able to run in parallel, while task 3 must wait for the first two, before executing.  The only way a generic task execution system can do this is if the dependencies between the tasks are explicit.  In Ant you define tasks with explicit dependencies on other named tasks.  In Make, you depend on "tasks" via the files they generate.  In a Maven-style build, you have no way of knowing if one "goal" relies on the output of another "goal".  You just read the filesystem, and sometimes it's ok, sometimes not.  In Ant, you'd provide a dependency on the task, but you have no way of ensuring that task generates the file you need, or if you're even depending on the right task.  So, while Maven has to do by-project parallelization, and ant can do task serialization assuming you wire the dependencies correctly,  sbt tasks, by default rely directly upon the things they require, and can be safely parallelized by default.

Consequences of being parallel by default

I remember the days of passing -j to make, and seeing if I had configured my build correctly.   Usually things worked out after a bit of mucking with orderings and enforcing dependencies.  Sometimes, though, if a section of the build was written too quickly, or too cleverly, then it would fail to parallelize.  However, the benefit of using make with a two-core machine was evident for C++ builds.   With Scala's compilation times leaning towards the heavy end, a build tool for Scala must support using more than one core.  This means, we have to be more careful when writing our tasks to ensure that we can be parallel.

One such restriction is that all data passed between tasks must be explicit.  If a task generates a file which is used by another task, it needs to “return" that file as something the other task can depend on (similar to old Makefiles).  This is one reason why sbt exposes a key system.   If you need to generate a file or directory, then create a task to do so and assign it to a TaskKey[File]. Then, any future task which needs to use this value can access it directly, without having to go check the filesystem for existence.

This can actually be a blessing. sbt also supports self-introspective builds.  This means you can inspect, for a given task/setting key:

  1. What tasks/settings depend on any given task/setting.
  2. Where the task was defined (file/line number, plugin, or sbt default)
  3. The current value of a setting.

Just try out the inspect or inpsect tree commands in the sbt console.

Another restriction of parallelization is that files should be write-once.   The days of reading/writing to the same file between tasks are gone.  We can't be sure of exactly how tasks will be ordered, so it's always safer to just return a new file for a new task.  Perhaps more importantly than parallelization, this leads to less error-prone builds.  Not only that, but now tasks can do a better job avoiding work if they remember what they output previously.  In sbt, "incremental" is something we try to push down into the guts of every task, as best as we are able.

After the parallelization and Scala decisions were made, the third decision flowed naturally from our users (and our own) usage:  A build should be interactive.

Decision 3 - Interaction is as important as automation

A fundamental misconception in most build tools is that they are solely used for automation.   The reality is, that while automation is the primary goal for a build tool, interaction with a developer is where a lot of build tools see a significant amount of their usage. This could be anything from the build tool interacting with an IDE to developers who use text editors and run the build tool on the command prompt.   This later set of developers tend to gravitate towards sbt, thanks to its high interaction and value provided.

sbt's interactive features were somewhat driven, initially, by lack of IDE tooling for the Scala programming language.   Before the time of the IntelliJ plugin, and in the infancy phase of the Scala IDE for Eclipse, there was sbt.  Due to its advanced incremental compilation algorithm and snappy load times, early sbt users continued to push the tool as a companion for development.   This lead to the development of some truly awesome features for developers.

One example is "triggered execution” (~).  This is where sbt will automatically watch the filesystem and rerun the given task upon changes to source code.  This feature came about as a request for continuous compilation, where we were able to generalize, not just a specific task for continuous compilation, but a general build feature for any task.

Another compelling feature of sbt is the ability to "only rerun tests that my latest changes affect" (testQuick). This leverages the knowledge inside the incremental compilation structures to determine which unit tests are affected by the most recent code changes and only run these unit tests.

In general though, there are three features which provide the majority of interactivity for developers, and are the primary selling point of the tool:

  • Triggered execution (`~`)
  • InputTasks
  • Commands

These three provide varying levels of user-input and control for the build, but ultimately provide the flexibility that helps our ecosystem create a great integrated build environment for its users.

Triggered execution is where sbt will automatically check for modifications to source files (or new source files) and re-run the desired task. This is a great mechanism to combine with test driven development or general bug maintenance.   Write your tests, start up ~ testQuick and make your changes until everything passes the test suite.  Rather than running every test with every change, you'll only be re-running your unit tests that matter.

InputTasks provide the ability to run tasks that also (optionally) require user input.   Not only do they take user input, but the input parsing library has built in support for autocomplete. Don't remember the name of the class you want to run? Type run-main and let sbt supply you with the available options. This autocomplete feature is exposed so all plugins can make use of it.

Commands provide a mechanism whereby the raw structure of sbt can be adjusted.   Plugins like sbt-release take advantage of this to make manipulations to settings and tasks in a build and construct "workflows" of tasks that need to be execute.   This allows features like cross-versioning to be used while still having a "one button" release, as the command will take care of reconfiguring settings.

Consequences

The consequences of all this interaction is that most sbt users are leveraging the sbt console continuously during development. sbt is no longer just about build automation, but is a companion/essential piece of the development environment.

Where are we now?

So what's next? Besides encouraging our plugin authors to continue to explore the space of automation for every task in the enterprise, we plan to push the interaction and reactivity of sbt to new levels.  There already exists some great IDE tooling with the IntelliJ IDEA project. It's our aim to make sure that ways in which users are pushing for sbt usage become first class. The sbt console should be able to live alongside whatever development environment you have, as a first class component.   This should also help us deliver new features into our toolchain that can reach you, the developer, regardless of what environment you use. For example, the recently added `testQuick` would be nice to use inside IntelliJ/Eclipse/SublimeText/Emacs, you name it.

To that end, we're working on an interface protocol, and server specification for the next version of sbt. We encourage you to join in the discussion and the future of development.

comments powered by Disqus
Browse Recent Blog Posts