Thin (ish) Clojure jars for better docker containers
Recently I’ve been creating “thin” jars for my Java applications, in order to avoid the waste caused by putting uberjars in containers. Unlike an uberjar, a “thin” jar contains only our application code, with all the dependencies split out and referenced in the uberjar’s manifest file. This lets us store the app code in a separate docker layer, allowing us to save all the storage & network IO that we’d otherwise waste re-transferring all those app dependencies.
Despite this, it’s still common practice to deploy Clojure apps as uberjars, so I thought I’d try implementing the same dependency/app-jar separation for my Clojure apps. Wondering why such an option isn’t already a built-in function for tools.build, I thought it would be as simple as recreating Maven’s copy-dependencies step in Clojure.
The bad news: we can’t get all the way there; the good news: we can get pretty close - at the very least, a concrete improvement over containerized uberjars.
The Problem
Given that both Java & Clojure apps ultimately get complied down to JVM bytecode, in theory we should be able to apply to Clojure apps the same tricks we can use for Java. There’s an unexpected catch though: Clojure’s complier is non-deterministic in its output, i.e. it can output different class files across different runs for the same input.
Any differences in this output don’t affect the behaviour of the compiled program, but just the difference of even a single byte here invalidates any layer caching. Therefore we could create a “thin” jar and split it into a different layer of our docker image, but we wouldn’t actually benefit because each new build would require transfer & storage of the same amount of data as an uberjar, losing the very benefits we are hoping to achieve.
A (partial) Solution
All is not lost, however; we can still split out any jars containing already-compiled bytecode, creating a slimmer uberjar containing only bytecode generated from our Clojure application code and any Clojure libraries it uses.
In the build.clj
file below, we
- Work out which libraries contain already-compiled code (as opposed to only
.clj
source files) - Copy the precompiled library jars to a separate
lib
directory - Compile any remaining Clojure dependencies & our app code, and put just those into an uberjar
- Reference the jars in the
lib
directory in the uberjar’s manifest file
(ns build
(:require [clojure.tools.build.api :as b]
[clojure.string :as str]
[clojure.java.io :as io])
(:import java.util.zip.ZipFile))
(defn- contains-class? [absolute-path]
(with-open [zip-file (ZipFile. absolute-path)]
(->> (iterator-seq (.entries zip-file))
(remove #(.isDirectory %))
(map #(.getName %))
(filter #(str/ends-with? % ".class"))
first)))
(defn copy-dependencies [{:keys [libs] :as _basis}]
(let [paths (mapcat (comp :paths val) libs)]
(doseq [src paths]
(let [target (str "target/lib/" (.getName (io/file src)))]
(b/copy-file {:src src :target target})))))
(defn- split-basis [{:keys [libs] :as basis}]
(let [grouped (group-by
(fn [[_sym {:keys [paths]} :as _lib]]
(boolean (contains-class? (first paths))))
libs)]
[(assoc basis :libs (into {} (get grouped true)))
(assoc basis :libs (into {} (get grouped false)))]))
(defn jar
([] (jar {}))
([opts]
(b/delete {:path "target"})
(let [[precompiled-basis uncompiled-basis]
(split-basis (b/create-basis opts))]
(println "Copying source...")
(b/copy-dir {:src-dirs ["src" "resources"]
:target-dir "target/classes"})
(println "Copying precompiled dependency jars...")
(copy-dependencies precompiled-basis)
(println (str "Compiling " main "..."))
(b/compile-clj {:basis uncompiled-basis
:class-dir "target/classes"
:src-dirs ["src"]
:ns-compile [my-app.core]})
(println "Building jar")
(b/uber {:basis uncompiled-basis
:class-dir "target/classes"
:main my-app.core
:manifest {"Class-Path" (->> (.listFiles (io/file "target/lib"))
(map #(str "lib/" (.getName %)))
(str/join " "))}
:uber-file "target/my-app.jar"}))))
This gives us a smaller app jar, alongside its jars that contain compiled bytecode. (NB this includes Clojure core!)
.
├── my-app.jar
└── lib
├── asm-9.2.jar
├── avro-1.11.0.jar
├── checker-qual-3.8.0.jar
├── clojure-1.11.1.jar
├── commons-compress-1.21.jar
...
└── zstd-jni-1.5.2-1.jar
With this in place, we just need to make sure our Dockerfile copies over the libs in a separate layer to the uberjar, like so:
COPY --from=app-build /app/target/lib /app/lib
COPY --from=app-build /app/target/my-app.jar /app/my-app.jar
As a result, each time our CI server builds a new Docker image we go from something like this:
5f62416f87c9: Pushing [==================================================>] 31.95MB
1668bc9c2910: Layer already exists
e6857d4762fd: Layer already exists
3af14c9a24c9: Layer already exists
to something like this:
acd5c6ddc038: Pushing [==================================================>] 4.705MB
8bf418842124: Layer already exists
1668bc9c2910: Layer already exists
e6857d4762fd: Layer already exists
3af14c9a24c9: Layer already exists
i.e. each new build results in less network IO and image repo storage; the cost
is an extra image layer, and a slightly more involved build.clj
. YMMV of
course, dependending on your app’s dependencies - but Clojure apps tend to
depend upon a large stack of Java dependencies, and so savings like the above
wouldn’t be uncommon.
What about deploying Clojure sources?
Of course, an alternative to all this for which we could definitely achieve a truly thin application image layer would be to not bother with AOT compilation and deploy Clojure sources. The tradeoff here is between build & infrastructure costs against application startup time. This is a tradeoff potentially worth making - but if you’re already deploying uberjars, then you’re probably doing so because you want to avoid that compilation at application start.
Conclusion
Compared to containerizing an uberjar, we get:
- Faster app upgrade times
- Reduced image registry storage and costs
- No compromise to app start startup time
While we sadly can’t achieve the same image size efficiency gains as compared to
a Java application, there are definitely still gains to be had - and they come
at just the cost of some extra lines of code in our build.clj
file.