Most of the resources on the internet I ran across while reading about multi-stage builds tout the benefit of smaller images. While it is a great feature, I’ve had the experience of benefiting from other, possibly underrated, side-effects of multi-stage builds: caching and parallelism. In my opinion1, these two offer so much better user experience during quick experimentation cycles. A lot of this advice depends heavily on context, so this applies selectively1.
Skip this part if you already know what multi-stage builds are.
Every dockerfile needs a
FROM instruction, and it can take other local sources. For example, say I have the following dockerfile I call “base”:
which may be used in other dockerfiles:
There are two main points of interest:
Dockerfile.helloworldis dependent on
Dockerfile.base, everytime the base is rebuilt (and has some changes), the
helloworldimage also needs to be rebuilt.
Opposite, but similar, logic applies when only the
helloworldimage has to be rebuilt: there’s no need to build the
baseimage if it already exists.
And multi-stage builds are, at a fundamental level, similar to having multiple smaller base images2. Which brings me to the best advantage(s) I’ve seen for multi-stage builds: caching, and parallelism.
Consider the following Dockerfile; the objective is to install Go and rust toolchains—so that the
Dockerfile doesn’t end up too long/complicated3. The built image shall be used in a CI server as the test/build environment for our fancy app4.
Second run (without changing anything):
This is fine so far. Docker caches each of the layers (there’s a useful label in the output too!). However, the moment you change even a single character, the cache gets busted—even when there’s no change in the actual script content. For instance:
I moved up the URL part in the
rustup build to be in line with the
curl command, and that busted the next step where I install Go. This is a trivial example, but this happens—inadvertently, in a lot of cases—and is frustrating. Looking at the installers and knowing what they do, even if we’re sure that both steps are mutually exclusive, Docker can’t know that in this format.
Unless we use stages, that is. If
rust installer steps were in different stages, docker can predetermine right at the Dockerfile parse step that they are independent. As long as each stage doesn’t directly use the artifacts of the previous one, cache busting in one won’t affect the other.
The same dockerfile can be written using multi-stage builds as:
The possible separation of
go steps also leads to the possibilty
of parallelization, which improves build times overall. This is a little hard to demonstrate using shell output, so I’ve put up a bunch of asciicasts showing the effect:
Rebuild of the above Dockerfile with the positions of
gosections interchanged6: Asciicast, Dockerfile. This build too is a cached one. Without multi-stage builds, it won’t be the case as shown in the example in caching section.
Although I used modern, simpler installers for the examples, a similar strategy could be used for dynamic languages. There might be a post in the future…
…mostly humble, but backed by some experience and context. The docker images we use at work get used for setting up the environment for our build multi-tenant Jenkins build servers. These Dockerfiles have lots of build-steps for languages, so rebuilds of the images tend to be slow unless optimized well. This makes for a 💩experience when trying to change anything. ↩ ↩2
The major difference between multiple base images vs multi-stage builds is everyone’s favourite feature of smaller builds (using
COPY --frominstruction). Achieving smaller images is also possible with multiple base images, and the pattern has a name: builder pattern, and is quoted in the documentation for multi-stage builds ↩
In one of our work projects, I have to install Ruby and Python instead, which prompted this journey into trying to speed up the build. The steps involve compilation from source and the script is…complicated. ↩ ↩2
…that requires 3 languages at minimum to build artifacts that end up as a website at some point. SMH. ↩
Compiling Ruby and Python, or installing them using the debian official packages results in a folder soup of targets that need to be
COPY-ed from. That experience is painful, at best. ↩