Bazel: what you give, what you get

June 14, 2021
Alex Eagle
Founder & CTO at Aspect.Dev
Bazel: what you give, what you get

There are a few ways I like to describe Bazel to an engineer who hasn't used it. If they have used similar build tools like Gradle or Make, I'll usually start with comparing the configuration affordances or differences in execution strategies. But most engineers have only used the canonical tooling for the language they write, and only superficially interact with it. After all, they're busy writing the code, and the build system normally tries to hide behind the scenes. With Bazel, we're asking engineers to understand a bit more, and here's where I like to start:

Bazel offers you this proposition: you describe the dependencies of your application, and Bazel will keep your outputs up-to-date.

Describe your dependencies

Most build systems allow any code to depend on anything. As a result, they are limited in how aggressively they can minimize re-build times. This is extra work you'll need to do, to get Bazel's benefits.

Your job is to describe your sources, by grouping them into "targets". For example, "a TypeScript library", "a Go package", "a dynamic-linked Swift library", etc. You say which source files in your repo are part of each target, and then what other targets it "depends" on. Sometimes you can give some other bits of description, like the module name this code should be imported by, or options for compiling it. Sometimes you'll have to indicate runtime dependencies as well, such as some data file read during one of your tests.

That's it - you don't have to tell Bazel what to do with these sources.

The amount of work varies. Since your source code generally hints at the dependencies (like with import statements) it's possible for tooling to automate 80% of the work, and such BUILD file generators exist for a small, increasing number of languages. It's also up to you how detailed to be - you can just make one coarse-grained target saying "all the Java files in this whole directory tree are one big library", or you could make fine-grained ones for each subdirectory, or something in the middle.

The more correct your dependency graph, the more guarantees Bazel provides. If your graph is missing some inputs, then Bazel can't know to invalidate caches when those inputs change. This is called non-hermeticity. If your tools produce different outputs for the same inputs (like including a timestamp or non-stable ordering), then Bazel will be less incremental than it should since dependent targets will have to re-build. This is called non-determinism.

As a side benefit of describing your dependencies, sometimes you'll also discover undesired dependencies, so you can fix those and/or add constraints to prevent bad dependencies from being introduced in your code.

Keeps your outputs up-to-date

In exchange for your work in describing your dependencies, you get a fantastic property: fast, incremental, and correct outputs.

Your outputs are a filesystem tree, usually in the bazel-out folder. Bazel populates some subset of this tree depending what you ask for. If you ask for the default outputs of a Java library, Bazel places a .jar file in the output tree. If you ask for a test to be run, Bazel places the exit code of that test runner in the output tree (representing the pass/fail status).

Bazel does the minimum work required to update the output tree. In the trivial case, Bazel queries the dependency graph and determines that the inputs to a given step are the same as a previous build, and does no work. This "cache hit" is the common case. If you don't have a cache hit locally on your machine, Bazel will fetch one from a remote cache.

If you change one file, then any nodes in the dependency graph that directly depend on it must be re-evaluated. That might mean a compiler is re-run. However if the result is the same as a previous run, then there is no more work to be done. This avoids "cascading re-builds" where a whole spine of the tree is re-evaluated.

There are a lot of things you can do with an incrementally-updated output tree. For example, you can set up your CI to just run bazel test //... (test everything) and then rely on Bazel incrementality and caching to be sure only the minimal build&test work happens for each change.

There's a lot more to Bazel, but I find this description fits well in a two-minute attention span and conveys the basic value proposition.

Related articles

Newsletter