What is a build system and what is CI?

October 26, 2021
Alex Eagle
Founder & CTO at Aspect.Dev
What is a build system and what is CI?

For a long time, I thought I knew the answer to that question. A build system understands your software, how to build and test it. And a CI is a loop that runs the build system on a server.

When I was the tech lead for Angular CLI, I asked a lot of our big corporate users "what build system do you currently use" and the most common response was "Jenkins". Of course with my preconception of what these terms mean, I thought they were just wrong.

It turns out they were right, because they turned Jenkins into the build system. They probably started in the small scale with a build system for the frontend code (let's say npm scripts) and a build system for the backend (let's say Maven), and at that time Jenkins would have run these independently. As things got complex and interconnected, they'd need integration tests, so no surprise, the one common place these could be added is using some Groovy code or a plugin to Jenkins (There should be a software aphorism "any tool with sufficient adoption grows a plugin ecosystem, thereby rendering it redundant with tools it should have complimented".)

Now they created a build system you can't run locally on your machine, only in the CI environment, and kinda ruined CI for everyone. Now engineers have to wait forever to go through a CI loop to get something green, because it's too hard to reproduce the failure locally to fix it. It's sad, but understandable the way this evolved.

There are build systems which are meant to generalize across the stack and are locally-reproducable (Bazel of course) - but what I've learned is that to sell that solution, you have to frame it as replacing CI, not replacing the build system. After switching to Bazel, you actually have a "CI" you can run locally on your machine, with a server that runs that thing in a loop. And you try not to get too hung up on how the term "CI" lost all its meaning in the process.

Bazel: what you give, what you get

There are a few ways I like to describe Bazel to an engineer who hasn't used it. If they have used similar build tools like Gradle or Make, I'll usually start with comparing the configuration affordances or differences in execution strategies. But most engineers have only used the canonical tooling for the language they write, and only superficially interact with it. After all, they're busy writing the code, and the build system normally tries to hide behind the scenes. With Bazel, we're asking engineers to understand a bit more, and here's where I like to start:

Bazel offers you this proposition: you describe the dependencies of your application, and Bazel will keep your outputs up-to-date.

Describe your dependencies

Most build systems allow any code to depend on anything. As a result, they are limited in how aggressively they can minimize re-build times. This is extra work you'll need to do, to get Bazel's benefits.

Your job is to describe your sources, by grouping them into "targets". For example, "a TypeScript library", "a Go package", "a dynamic-linked Swift library", etc. You say which source files in your repo are part of each target, and then what other targets it "depends" on. Sometimes you can give some other bits of description, like the module name this code should be imported by, or options for compiling it. Sometimes you'll have to indicate runtime dependencies as well, such as some data file read during one of your tests.

That's it - you don't have to tell Bazel what to do with these sources.

The amount of work varies. Since your source code generally hints at the dependencies (like with `import` statements) it's possible for tooling to automate 80% of the work, and such BUILD file generators exist for a small, increasing number of languages. It's also up to you how detailed to be - you can just make one coarse-grained target saying "all the Java files in this whole directory tree are one big library", or you could make fine-grained ones for each subdirectory, or something in the middle.

The more correct your dependency graph, the more guarantees Bazel provides. If your graph is missing some inputs, then Bazel can't know to invalidate caches when those inputs change. This is called non-hermeticity. If your tools produce different outputs for the same inputs (like including a timestamp or non-stable ordering), then Bazel will be less incremental than it should since dependent targets will have to re-build. This is called non-determinism.

As a side benefit of describing your dependencies, sometimes you'll also discover undesired dependencies, so you can fix those and/or add constraints to prevent bad dependencies from being introduced in your code.

Bazel conference

Keeps your outputs up-to-date

In exchange for your work in describing your dependencies, you get a fantastic property: fast, incremental, and correct outputs.

Your outputs are a filesystem tree, usually in the `bazel-out` folder. Bazel populates some subset of this tree depending what you ask for. If you ask for the default outputs of a Java library, Bazel places a `.jar` file in the output tree. If you ask for a test to be run, Bazel places the exit code of that test runner in the output tree (representing the pass/fail status).

Bazel does the minimum work required to update the output tree. In the trivial case, Bazel queries the dependency graph and determines that the inputs to a given step are the same as a previous build, and does no work. This "cache hit" is the common case. If you don't have a cache hit locally on your machine, Bazel will fetch one from a remote cache.

If you change one file, then any nodes in the dependency graph that directly depend on it must be re-evaluated. That might mean a compiler is re-run. However if the result is the same as a previous run, then there is no more work to be done. This avoids "cascading re-builds" where a whole spine of the tree is re-evaluated.

There are a lot of things you can do with an incrementally-updated output tree. For example, you can set up your CI to just run `bazel test //...` (test everything) and then rely on Bazel incrementality and caching to be sure only the minimal build&test work happens for each change.

There's a lot more to Bazel, but I find this description fits well in a two-minute attention span and conveys the basic value proposition.

CBOI: Continuous Build, Occasional Integration

Is your organization practicing CBOI? If you haven't heard this hot new industry acronym, it stands for "Continuous Build, Occasional Integration." A lot of big companies are using this technique. It's a different way of approaching Continuous Integration (CI).

By different, I mean a lot worse.

In fact, your organization should *not* practice CBOI. So why write an article about it? Because, sadly, most organizations who claim to do CI are actually doing CBOI. I'll explain why that is, and how you can stop.

What is CI?

Let's break down the terms a bit to start. "Continuous" is just a way of saying "infinite loop" - we trigger on every change or on a regular interval, and give feedback to the development cycle, such as alerting developers that they broke an automated test. Easy, and not controversial.

"Integration" is a much more nuanced term. In most software shops, what we mean here is that we bring together the artifacts from independent engineering teams into a functioning system. A common example that I'll use in this article is a Frontend and a Backend.

In a small organization, with only a few developers, Integration isn't much of a problem. Every engineer develops on the whole stack, and runs the complete system locally. As the organization scales, however, teams break up and specialize. The full system is eventually too complex to fit in one person's head, though the Architect tries mightily. The more the org structure gets broken up, the more different software systems diverge and the harder it is to guarantee that the code they're writing works when integrated.

In order to perform Continuous Integration, then, you need an automated way to integrate the full stack. In working with a number large companies, I've rarely observed this automation. Instead, individual developers just work on their code (not surprising since they would prefer to work in isolation, reducing their cognitive load and learning curve). They aren't able to bring up other parts of the system, for a variety of reasons I'll list later. However, the engineers know (or their managers instruct them) to set up a "CI" for their code. So they take the build and test system they use locally, and put it on a server running in a loop. In our example, the backend team runs their backend tests on Jenkins.

Is that CI? There's an easy litmus test to determine that.

How to tell if you're doing CBOI rather than CI

Let's say the backend team makes a change, that will break the frontend code. To avoid certain objections, I'll add that this change isn't something we expected to be part of the API contract between these layers: let's say we just caused the ordering of results from a query to change. At what point in your development cycle will you discover the problem?

In organizations doing CBOI, the answer is that they'll find out in production when customers discover the defect. That's because the automation couldn't run the frontend tests against the HEAD version of the backend, and since the change appeared API-compatible, no one tried to manually verify it either. When you're discovering your bugs in prod, you should start asking the hard questions in your post-mortem: why didn't our CI catch this? And in our example, the answer shocks our engineers: they didn't have CI after all.

Instead of CI, their setup was individual teams testing their code in a loop, which is a Continuous Build (CB). Then when they released to prod, the Release Engineer performed the actual integration, by putting the code from different teams together in the finished system. They only do those releases on a less-frequent cadence. That's Occasional Integration (OI).

If a developer wanted to debug the problem, they'd be forced to "code in production". With no way to reproduce the full stack, they have to push speculative changes and look at production logs to see if they've fixed it. SSH'ing into a production box to make edits is the opposite of what we want. For space, I won't go into details on this as it merits a separate article (and is maybe obvious to you).

So we've finally defined what CBOI is, and seen how it causes production outages and scary engineering practices. Ouch!

How to stop doing CBOI

I have to start this section with a warning: it isn't going to be easy. The Continuous Build was setup because it was trivial: take the build/test tool the developers were running for their code and put it on a server in a loop. There isn't a similarly easy way to integrate the full stack. It may even require some changes to your build/test tools, or to the entry-point of your software. However if your organization has a problem with defects in production (or wants to avoid such a problem), this work is worth doing.

Also, although the example so far was a Frontend and a Backend, which are runnable applications, CI is just as important for other vertices of your dependency graph, such as shared libraries or data model schemas.

I'll break this down into a series of problems:

  1. Developers can't run the full stack
  2. No integration test fixture exists that can detect the defect
  3. Resource constraints make it uneconomical to run all the tests
    Along the way (spoiler alert) I'll explain how one Integration tool solves the technical problems.
    However we'll conclude with a final problem, the people problem:
  4. The organization is averse to integrating dev processes
People problems are always harder than software problems, as I learned from early Google luminary Bill Coughran.

Why devs can't run the full stack

As I mentioned earlier, our ideal integration happens on the developers machine. After making that non-order-preserving backend change, you'd just run the frontend tests to discover the breakage. In practice this is much harder than it should be.

First, you might need your machine in a very particular state. You need compilers and toolchains installed, at just the right versions, statically linked against the right system headers, and running on an OS that's compatible with prod. Most teams don't have an up-to-date "onboarding" instructions that carefully covers this, and since the underlying systems are always churning, you don't even know whether your instructions will work for the next person trying to run your code.

Next, many systems require shared runtime infrastructure ("the staging environment") or credentials. These either aren't made available to engineers, or they're a contended resource where only one person can have their changes running at a time.

It's also common that knowledge of how to bring up a fresh copy of the system isn't written down anywhere, and hasn't been scripted. Only the sysadmin has the steps roughly documented in an unsaved notepad.exe buffer, so when you need to bring up a server, that person clicks around the AWS UI to do so.

To solve these problems, and unlock your developers ability to run the whole system, you need:

  • A tool like Bazel that manages the toolchains and keeps the configuration roughly hermetic, so a dev can "parachute" into someone else's code and run it at HEAD without any setup to maintain.
  • The ability to cheaply spin up a new environment anywhere. For example if you deploy to a Kubernetes cluster, use something like minikube to make a miniature local environment that mimics production and re-uses most of the same configs.
  • Robust scripting that automates the release engineer's job. It should be possible for a test to run the same setup logic to make a fresh copy of the system under test.

The configurations need to be "democratized" for this to work well. Under Jenkins you might have had some centralized Groovy code that looks at changed directories or repositories and determines tests to run. This doesn't scale in a big org where many engineers have to edit these files. Instead, you should push configuration out to the leaves as much as possible: co-locate the description of build&test for some code at the nearest common ancestor directory of those inputs. Bazel's `BUILD.bazel` files are a great example of how to do this.

Integration test fixtures

Remember that tests are written in three parts, sometimes called "Arrange, Act, Assert". The first part is to bring up the "System under test" (SUT).

In order to assert that the frontend and backend work together, our automated test first needs to integrate the frontend and backend, by building both of them at HEAD and running them in a suitable environment, with the wiring performed so they can reach each other for API calls. You'll need a high-level, language-agnostic tool to orchestrate these builds, in order to build dependencies from head. Again, Bazel is great for this.

You'll find there is natural resistance here: the "first mover" cost is very high. An engineer could easily spend a week writing one test to catch the ordering defect I mentioned earlier. In the scope of that post-mortem, someone will object "we can't possibly make time for that." But of course, the fixture is reusable, and once it's written you can add more true "integration tests", even writing them at the same time you make software changes rather than as regression tests for a post-mortem.

If the code is in many repositories, that also introduces a burden. You'll either need some "meta-versioning" scheme that says what SHA of each repo to fetch when integrating, or you'll need to co-locate the code into a single monorepo (which has its own cost/benefit analysis).

Not economical to run all the tests

The last technical problem I'll mention is test triggering. In the CBOI model, you only needed to run the backend tests when the backend changed, and the frontend tests when the frontend changed. And they were smaller tests that only required a single system in their test fixture. CI is going to require that we write tests with heavier fixtures, and run them on more changes.

Triggering across projects is tricky. Our goal is to avoid running all the tests every time, but to run the "necessary" ones. You could write some logic that says "last time we touched that backend we broke something, so those changes also trigger this other CI". This logic is likely flawed and quickly rusts, so I don't think it's a good strategy. You could automate that logic using some heuristics, like Launchable does. But to make this calculation reliably correct, ensuring that *all* affected tests are run for a given change, you need a dependency graph. Bazel is great for expressing and querying that graph, for example finding every test that transitively depends on the changed sources.

In a naive solution, it's also too slow to build everything from HEAD. You need a shared cache of intermediate build artifacts. Bazel has a great remote caching layer that can scale to a large monorepo, ensuring that you keep good incrementality.

Organization Averse to Integrating

Lastly, I mentioned there's a non-technical problem as well. Even with clever engineers and the right tools, like Bazel, this might be what sinks your effort.

Engineers want to work in isolation from each other. For example, the backend engineers think JavaScript is a mess and don't want to learn anything about frontend code. Engineers are amazingly tribal! Try asking a Mac user to develop on Windows or vice-versa.

To do CI, we're asking that the backend engineers have to look at the frontend test results when something is red, to determine if their changes caused a regression. We're asking the frontend engineers to wait for a build of the backend to run their tests against. These teams never had to work closely together in the past.

Worse, we're also asking the managers to act differently. This is an infrastructure investment for the future, requiring some plumbing changes in the build system. So only an organization willing to make strategic decisions will be able to prioritize and consistently staff their CI project. Also, the managers from different parts of the org will have to reach some technical agreement between their teams about standardizing on build/test tooling that can span across projects. This may run into the same friction you always have when making shared technical decisions.

Epilogue: coverage

I like to beat up on test coverage as a metric, because it weights entirely on executing lines of code, but not on making assertions. In the context of CBOI, test coverage is also misleading. You might have 100% test coverage of the frontend, and 100% test coverage of the backend, but 0% test coverage of defects seen when integrating the two. I think this contributes to the misunderstanding among engineering managers.

Related articles