A Practical Guide to test.check

5th January 2024

Here’s a pragmatic guide to generative testing in Clojure using test.check, oriented around spec.

One of spec’s main selling points is that it can be used for validation, instrumentation and generative testing, but in practice I don’t see very many codebases using spec also taking advantage of its integration with test.check. So if you’re already using spec - or at least familiar with it - then this guide is for you. But even if not, hopefully this guide will still give you a much better understanding of test.check and generative testing in Clojure.

Table of Contents

Section 1: Quick Start
Section 2: Key Features
Section 3: Tips for writing good generative tests
Section 4: Creating generators
Conclusion

Section 1: Quick Start

What is `test.check`?

In short, test.check is a Clojure library for performing generative tests. Generative tests - also known as property-based tests - are tests that run our code under test against random inputs, then check general properties that should apply across the range of possible randomly generated data. In contrast, we refer to traditional unit tests that operate on fixed inputs as example-based tests.

Your first generative test

To whet your appetite, let’s start with a simple example.

First off, you’ll need the following dependencies:

;; Main test.check dependency
org.clojure/test.check
{:mvn/version "1.1.1"}

;; Optional; extra tools, and (IMO)
;; better clojure.test integration
com.gfredericks/test.chuck
{:mvn/version "0.2.13"}

Then we can create the following failing test:

(require
 '[clojure.spec.alpha :as s]
 '[clojure.test
   :refer [is testing]]
 '[clojure.test.check.clojure-test
   :refer [defspec]]
 '[com.gfredericks.test.chuck.clojure-test
   :as chuck])

(defn broken-sort [coll]
  (if (some #{13} coll)
    nil
    (sort coll)))

(defspec broken-sort-gen-test
  (chuck/for-all [input-coll
                  (s/gen (s/coll-of int?))]
    (let [output-coll
          (broken-sort input-coll)]
      (testing
       "Result is in ascending order"
        (when (seq input-coll)
          (is (apply <= output-coll))))
      (testing
       "The sorted collection contains the same elements"
        (is (= (group-by identity
                         input-coll)
               (group-by identity
                         output-coll)))))))

(broken-sort-gen-test)
; => Fails! (95% of the time...)

(require '[clojure.spec.alpha :as s]
         '[clojure.test :refer [is testing]]
         '[clojure.test.check.clojure-test :refer [defspec]]
         '[com.gfredericks.test.chuck.clojure-test :as chuck])

(defn broken-sort [coll]
  (if (some #{13} coll)
    nil
    (sort coll)))

(defspec broken-sort-gen-test
  (chuck/for-all [input-coll (s/gen (s/coll-of int?))]
    (let [output-coll (broken-sort input-coll)]
      (testing "Result is in ascending order"
        (when (seq input-coll)
          (is (apply <= output-coll))))
      (testing "The sorted collection contains the same elements"
        (is (= (group-by identity input-coll)
               (group-by identity output-coll)))))))

(broken-sort-gen-test)
; => Fails! (95% of the time...)

The key things here are:

We define a test.check unit test using defspec. (NB the naming is a bit unfortunate - this has nothing to do with clojure.spec!)
We can create generators from specs using s/gen; we create a spec for a collection of integers on the fly with (s/coll-of int?), then grab a test data generator for it using s/gen
The for-all binding vector binds the generated value for each test run; the assertions are run every time a new test value is generated, and by default 100 test values are generated per test run

In your REPL try modifying broken-sort; you should find that the only implementation that consistently passes the test is a call to sort . Hopefully you can see already how powerful generative testing is; we can really nail down the correctness of the behavior we want with just a single test!

Formatting & linting

Out of the box, the macros we use here don’t play very nicely with cljfmt or clj-kondo. Thankfully though this is easily fixed with a couple of config files:

.cljfmt.edn:

{:extra-indents {for-all [[:inner 0]]}}

.clj-kondo/config.edn:

{:lint-as
 {clojure.test.check.clojure-test/defspec
  clojure.test/deftest

  com.gfredericks.test.chuck.clojure-test/for-all
  clojure.test.check.properties/for-all}}

Vanilla vs `test.chuck`’s `for-all`

In the above example I’ve used test.chuck’s version of for-all. In short, this is because the vanilla version of for-all doesn’t play nicely with using is assertions in tests; instead, test.check’s built-in for-all macro uses the overall truthiness of the body expression.

So, the test above rewritten to use the vanilla for-all would look like:

tabindex=0 class=chroma>

(require class=cl> '[clojure.test.check.properties :as p]) class=cl>(defspec broken-sort-gen-test-vanilla-for-all (p/for-all [input-coll (s/gen (s/coll-of int?))] (let [output-coll (broken-sort input-coll)] (and (testing "Result is in ascending order" (or (empty? input-coll) (and (seq output-coll) (apply <= output-coll)))) (testing "The sorted collection contains the same elements" (= (group-by identity input-coll) (group-by identity output-coll))))))) hidden class=wide>(require '[clojure.test.check.properties :as p]) class=cl>(defspec broken-sort-gen-test-vanilla-for-all (p/for-all [input-coll (s/gen (s/coll-of int?))] (let [output-coll (broken-sort input-coll)] (and (testing "Result is in ascending order" (or (empty? input-coll) (and (seq output-coll) (apply <= output-coll)))) (testing "The sorted collection contains the same elements" (= (group-by identity input-coll) (group-by identity output-coll)))))))
In particular, note that we need to group both our assertions within an and
form. This isn’t too bad on its own, but since we’re trying to integrate with
clojure.test here (by using defspec), it makes sense to me to prefer an
approach that lets us write tests more consistently.
Therefore from now on I’ll only use the test.chuck version of for-all. I’ll
always explicitly prefix it with the ns alias chuck for the sake of clarity,
but in your own code you’ll probably want to refer whichever version of
for-all you choose.
Section 2: Key Features
Usually when unit testing we are rightfully wary of using randomly-generated
data, since when used naively it can lead to flaky tests and hard-to-diagnose
failures. Proper generative testing differs from naive usage of random data by
ensuring deterministic behavior and ease of failure analysis.
Test.check provides this capability for reliable generative tests through the
following key features:
Intelligently calculates the simplest failing inputs - test.check calls this shrinking
Runs multiple iterations per test run - this helps prevent flakiness
Intelligently generates data - “simpler” data is generated first
The power of Shrinking
One of the most powerful features of test.check is shrinking, where rather
than simply spitting out the first failure it finds, it does some further work
to automagically find a simpler failing test input for us.
To see this in action, let’s take a look at the failing test output for our
earlier broken-sort-gen-test example:
{:shrunk
 {:total-nodes-visited 11,
  :depth 3,
  :pass? false,
  :result false,
  :result-data nil,
  :time-shrinking-ms 1,
  :smallest [{input-coll [13]}]},
 :failed-after-ms 5,
 :num-tests 13,
 :seed 1700759647930,
 :fail [{input-coll [15 36 -29 93 2 13 -756 -649 360 2]}],
 :result false,
 :result-data nil,
 :failing-size 12,
 :pass? false}
Under the :fail key we have [{input-coll [15 36 -29 93 2 13 -756 -649 360 2]}], which shows us that this particular test run failed with an input
collection of [15 36 -29 93 2 13 -756 -649 360 2]. However, test.check
automatically does some further digging for us to find that the simplest input
collection that fails is [13].
This is very powerful! This result might feel obvious given how contrived our
example is, but in more realistic circumstances this capability is very useful
indeed:
We’ve generated a failing test case; even before we take any shrinking
into account, right off the bat we’ve found a failure that we might not have
found through traditional example-based testing.
This tiny collection is much easier to debug than the initial failure
We can infer further information from the initial & shrunk data:Given that the simplified collection doesn’t contain the numbers in the
initial failure run, then we can infer that the other elements are most
likely irrelevant to the test failure
Since [] is simpler than [13], but test.check didn’t shrink all the way
down to [], we know that an empty vector would pass.
Multiple iterations per test run
One of the key parts of generative testing is avoiding the potential flakiness
we might introduce through naive usage of random data. One of the simpler ways
that test.check achieves this is by running multiple iterations per test run.
The key things to know here are:
For tests created using defspec, when not specified the number of iterations
is the value of clojure.test.check.clojure-test/*default-test-count*, which
defaults to 100.
Otherwise, you can manually specify the number of iterations you want per
test, like so:
(defspec my-test 200
When creating generative tests you might discover some inconsistent failures; if
so, one thing you can try to make the failures happen with more regularity is to
try upping the number of iterations. However, there’s only so much juice you can
squeeze out of this before you make your test runs impractically slowly. So if
upping the iterations isn’t enough on its own, you probably need to understand
test.check’s concept of sizing for its generators.
Size matters
A key feature of test.check’s data generators is that they each accept a size
parameter that places bounds on the resulting data. Test.check uses this to
generate smaller (or rather, simpler) data towards the start of a test run, then
uses larger and larger size bounds as the test run goes on. This helps to
make test runs more consistent and, in concert with shrinking, makes failure
cases simpler.
In practice, you will rarely if ever use this size parameter directly; as
mentioned above, it’s mostly used behind the scenes by test.check.
Sizing in action
We can experiment with size directly by using the generate function from the
ns clojure.test.check.generators, which optionally accepts a size parameter.
To demonstrate this more easily though, we can create a small helper function,
like so:
(require
 '[clojure.test.check.generators
   :as tgen])

(defn exercise-sizes
  ([generator sizes]
   (exercise-sizes generator sizes 100000))
  ([generator sizes num-iterations]
   (->>
    sizes
    (map
     (fn [size]
       [size
        (->> (repeatedly
              num-iterations
              #(tgen/generate generator
                              size))
             (into (sorted-set)))]))
    (into (sorted-map)))))

(exercise-sizes (s/gen integer?) (range 5))
;;=>
{0 #{-1 0}
 1 #{-1 0}
 2 #{-2 -1 0 1}
 3 #{-4 -3 -2 -1 0 1 2 3}
 4 #{-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7}}
(require '[clojure.test.check.generators :as tgen])

(defn exercise-sizes
  ([generator sizes]
   (exercise-sizes generator sizes 100000))
  ([generator sizes num-iterations]
   (->> sizes
        (map (fn [size]
               [size
                (->> (repeatedly num-iterations
                                 #(tgen/generate generator size))
                     (into (sorted-set)))]))
        (into (sorted-map)))))

(exercise-sizes (s/gen integer?) (range 5))
;;=>
{0 #{-1 0}
 1 #{-1 0}
 2 #{-2 -1 0 1}
 3 #{-4 -3 -2 -1 0 1 2 3}
 4 #{-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7}}
(Note: we’re using test.check’s generators namespace directly here, aliased as
tgen, since it contains some extra arity versions of functions that sadly
aren’t included in the usual clojure.spec.gen.alpha namespace.
We generally prefer the latter because it lazy-loads the generator
functionality, which then lets us specify our generators alongside our specs in
our production code, but without needing to include the test.check dependency
in a production build.)
Size is abstract
First of all, we can observe that the size parameter is abstract; i.e. although
larger size values generally result in a wider range of inputs, it’s not as
simple as max value = size, as we can see in the ranges of values produced by
(s/gen integer?) above.
Some generators ignore size
Some generators take no notice of sizing and so just generate values within the
same range regardless.
A good example of this is (s/gen uuid?), which generates UUIDs completely
randomly in the usual way regardless of sizing:
(exercise-sizes (s/gen uuid?) (range 3) 2)
;;=>
{0
 #{#uuid "dfca3a57-3100-4c5c-8c90-5dbaa93d859b"
   #uuid "212df58f-eb0d-487b-8225-1c61ad56e6d8"}
 1
 #{#uuid "eeb9d663-fb88-45ee-a4ba-9b184720425a"
   #uuid "185f2d24-88a7-43d4-9b9b-b715d3ebe0ae"}
 2
 #{#uuid "d837eafb-0803-4a99-847a-e4533a8db643"
   #uuid "fc3bce75-6684-4438-a8b1-9ae589aa52e3"}
A less obvious example is (s/gen boolean?), which generates a true or
false value equally likely regardless of size:
(exercise-sizes (s/gen boolean?) (range 5))
;;=>
{0 #{false true}
 1 #{false true}
 2 #{false true}
 3 #{false true}
 4 #{false true}}
Increasing size of test runs
For test runs, the size used for each iteration cycles from 0 through to 199,
then cycles back to 0 again.
i.e.
iteration 0: size 0
iteration 1: size 1
iteration 2: size 2
iteration 199: size 199
iteration 200: size 0
iteration 201: size 1
In particular, this means that it’s a good idea to specify your number of
iterations as 200 or more, since otherwise you don’t cover as big a range of
possible data; remember that defspec defaults to 100 iterations if you don’t
specify it. (This is an odd choice by the defspec implementation IMO,
especially since the underlying raw test.check test functions default to 200.)
We can see this in action by making use of the sample helper function, which
behaves in the same way.
(require '[clojure.spec.gen.alpha :as gen])

(gen/sample (s/gen integer?))
;;=> (0 -1 0 -1 -1 1 -7 -18 -5 -112)
We can see the deeper in the sequence the values appear, the more likely
it is to be a larger value.
To make this a bit more obvious, let’s specify a larger number of values to
generate:
(gen/sample (s/gen integer?) 20)
;;=> (-1 -1 0 -4 0 -7 -4 15 -1 17 -479 440 -15 -1615 64 -2 213 924 -1343 11157)
Section 3: Tips for writing good generative tests
The bare minimum: exposing exceptions
The good news is that even if we can’t think of some good general properties to
check, at the very least we can verify that our functions get as far as
returning something without blowing up with an exception.
For example, suppose we are testing a function like this speed calculation
function:
(defn speed [distance time]
  (/ distance time))
The this is the sort of function where the real meat of the logic is arguably
better demonstrated through traditional example tests. However, we can chuck
some numbers through it to expose the divide-by-zero error:
(defspec speed-gen-test 200
  (chuck/for-all
      [[distance time]
       (s/gen (s/tuple double? double?))]
    (speed distance time)
    (is true
        "Hey, at least we didn't blow up!")))

(speed-gen-test)
;;=> java.lang.ArithmeticException: Divide by zero
...
:smallest [{distance 1.0, time 0.0}]
...
(defspec speed-gen-test 200
  (chuck/for-all [[distance time] (s/gen (s/tuple double? double?))]
    (speed distance time)
    (is true "Hey, at least we didn't blow up!")))

(speed-gen-test)
;;=> java.lang.ArithmeticException: Divide by zero
...
:smallest [{distance 1.0, time 0.0}]
...
This shows that our function’s spec needs to be improved; we need to exclude
zero:
(s/def ::non-zero-double
  (s/and double?
         (complement zero?)))

(defn speed [distance time]
  (/ distance time))

(defspec speed-gen-test 200
  (chuck/for-all
      [[distance time]
       (s/gen (s/tuple double?
                       ::non-zero-double))]
    (speed distance time)
    (is true
        "Hey, at least we didn't blow up!")))

(speed-gen-test)
;;=> OK
(s/def ::non-zero-double
  (s/and double?
         (complement zero?)))

(defn speed [distance time]
  (/ distance time))

(defspec speed-gen-test 200
  (chuck/for-all [[distance time] (s/gen (s/tuple double? ::non-zero-double))]
    (speed distance time)
    (is true "Hey, at least we didn't blow up!")))

(speed-gen-test)
;;=> OK
Improving input & output specs with instrumentation
We can improve our generative tests indirectly by making use of
instrumentation - i.e.
automatic validation of function inputs against specs. This helps us to discover
functions whose input specs are too strict. Then as we widen the scope of our
specs, our corresponding generative tests will cover more ground.
For example, suppose we were trying to spec out the sort function, using
almost the same generative test as for our initial broken-sort example.
(However, we’ll create a duplicate my-sort function in order not to break the
real sort, since doing so would break our REPL in surprising ways!)
(s/def ::sortable
  (s/coll-of integer?))

(defn my-sort [coll]
  (sort coll))

(s/fdef my-sort
  :args (s/cat :coll ::sortable))

(my-sort [2 1])
;;=> (1 2)

(defspec broken-sort-gen-test
  (chuck/for-all [input-coll
                  (s/gen ::sortable)]
    (let [output-coll
          (broken-sort input-coll)]
      (testing "Result is in ascending order"
        (when (seq input-coll)
          (is (apply <= output-coll))))
      (testing (str "The sorted collection "
                    "contains the same elements")
        (is (= (group-by identity input-coll)
               (group-by identity output-coll)))))))

(my-sort-gen-test) ;;=> [passes]

(my-sort [1.2 1.1])
;;=> Execption: Spec assertion failed!
(s/def ::sortable
  (s/coll-of integer?))

(defn my-sort [coll]
  (sort coll))

(s/fdef my-sort
  :args (s/cat :coll ::sortable))

(my-sort [2 1])
;;=> (1 2)

(defspec broken-sort-gen-test
  (chuck/for-all [input-coll (s/gen ::sortable)]
    (let [output-coll (broken-sort input-coll)]
      (testing "Result is in ascending order"
        (when (seq input-coll)
          (is (apply <= output-coll))))
      (testing "The sorted collection contains the same elements"
        (is (= (group-by identity input-coll)
               (group-by identity output-coll)))))))

(my-sort-gen-test) ;;=> [passes]

(my-sort [1.2 1.1])
;;=> Execption: Spec assertion failed!
Here we’ve naively assumed we’re only sorting integers, but our instrumentation
has helped us find that this assumption was incorrect. (In practice, the
instrumentation would more likely be exposing issues during integration testing
or ad-hoc manual tests rather than REPL interaction - but you get the idea.)
So, let’s expand our ::sortable spec a little - why not expand it out to any
valid number? This gets past our instrumentation, but our generative test now
fails:
(s/def ::sortable
  (s/coll-of number?))

(my-sort [2 1])
;;=> (1 2)

(my-sort [1.2 1.1])
;;=> (1.1 1.2)

(my-sort-gen-test)
;;=> {:smallest [##NaN]}
Our generative test has shown us that our new ::sortable spec, meant to
include all numbers, also (unintuitively) includes ##NaN (Not a Number). But
due to quirks of ##NaN, it doesn’t play well with sorting. And sure enough,
the Clojure comparators guide
recommends removing all occurrences of ##NaN before sorting a collection.
Therefore it makes sense for our ::sortable spec to exclude ##NaN. With this
done, our generative test passes again:
(s/def ::sortable
  (s/coll-of (s/and number?
                    (complement NaN?))))

(my-sort-gen-test)
;;=> [passes]
If we were depending on example-based tests alone it could have been all too
easy to just add or update tests that try some plain doubles, while forgetting
about ##NaN.
The key thing here is that instrumentation works well in concert with
generative tests to help us get our specs just right; instrumentation helps
expose places where are specs are too strict, while generative tests help us
ensure that we don’t expand our specs out too widely in response.
Finding general properties to check
One of the main challenges of writing generative tests is finding good general
properties to test in your tests, without needing to write a test sophisticated
enough that it pretty much duplicates the logic of your code under test.
In most cases we can’t nail down the functionality as completely as we can with
something like a sort function. But we can still get quite a lot of coverage by
testing the following things:
Properties about the input
Properties about the output
Properties about how the output relates to the input
We get (1) from instrumentation, so we’re going to focus here on (2) & (3).
For example, suppose we were writing a generative test for the
camel-snake-kebab library’s
->kebab-case function. In case you’re not already aware, this converts
variously-cased strings to kebab case, like so:
(require '[camel-snake-kebab.core :as csk])
(csk/->kebab-case "fooBar") ;;=> "foo-bar"
(csk/->kebab-case "Foo_Bar") ;;=> "foo-bar"
We can’t write a test that completely describes the expected outputs
given a generalized input without duplicating the logic implemented by
->kebab-case in the first place. But we can at least pin down some general
properties, even if they don’t completely describe the expected behavior.
Let’s start with the following:
a) Result contains no uppercase characters or underscores
b) Ordering of letters is preserved
Which we can implement as a generative test like so:
(defspec kebab-case-gen-test 200
  (chuck/for-all [input
                  (gen/string)]
    (let [output
          (csk/->kebab-case input)]
      (testing
       "Result contains no uppercase characters or underscores"
        (is (re-matches #"[^A-Z_]*" output)))
      (testing
       "Letter ordering is preserved"
        (letfn [(lower-and-strip [s]
                  (-> (str/lower-case s)
                      (str/replace #"[-_]"
                                   "")))]
          (is (= (lower-and-strip input)
                 (lower-and-strip output))))))))
(defspec kebab-case-gen-test 200
  (chuck/for-all [input (gen/string)]
    (let [output (csk/->kebab-case input)]
      (testing "Result contains no uppercase characters or underscores"
        (is (re-matches #"[^A-Z_]*" output)))
      (testing "Letter ordering is preserved"
        (letfn [(lower-and-strip [s]
                  (-> (str/lower-case s)
                      (str/replace #"[-_]" "")))]
          (is (= (lower-and-strip input)
                 (lower-and-strip output))))))))
However, this test fails, with a simplest failing input of " " (space).
(You may notice that we’re using gen/string here as our generator rather than
(s/gen string?); this is because the latter only generates alphanumeric
characters.)
Let’s try this input out in the REPL to see what the problem is:
(csk/->kebab-case " ") ;;=> ""
Aha! We’ve stumbled across the fact that ->kebab-case also strips whitespace:
(csk/->kebab-case "foo  bar") ;;=> "foo-bar"
Let’s update our test to reflect this:
(defspec kebab-case-gen-test 200
  (chuck/for-all [input
                  (gen/string)]
    (let [output
          (csk/->kebab-case input)]
      (testing
       (str "Result contains no uppercase "
            "characters, underscores or whitespace")
        (is (re-matches #"[^A-Z_\s]*"
                        output)))
      (testing "Letter ordering is preserved"
        (letfn [(lower-and-strip [s]
                  (-> (str/lower-case s)
                      (str/replace #"[-_\s]"
                                   "")))]
          (is (= (lower-and-strip input)
                 (lower-and-strip output))))))))
(defspec kebab-case-gen-test
  (chuck/for-all [input (gen/string)]
    (let [output (csk/->kebab-case input)]
      (testing (str "Result contains no uppercase characters, "
                    "underscores or whitespace")
        (is (re-matches #"[^A-Z_\s]*" output)))
      (testing "Letter ordering is preserved"
        (letfn [(lower-and-strip [s]
                  (-> (str/lower-case s)
                      (str/replace #"[-_\s]" "")))]
          (is (= (lower-and-strip input)
                 (lower-and-strip output))))))))
This passes - great!
What this shows is that generative testing helps us find test cases that we
might not have thought of otherwise. We could have written plenty of
example-based tests where we didn’t take into account whitespace, but using
test.check has helped to expose this behavior.
Each iteration needs to be fast
One of the main restrictions to generative testing is that you only get the most
out of them if each iteration is lightning-quick. Too slow, and you’re forced to
choose between either bloating your test suite’s run time, or dropping the
number of data generation iterations, weakening the test’s reliability.
Ideally function under test should be a fast, pure function; failing that,
you’ll need to stub out any slow operations such as I/O through techniques such
as parameterization or using
with-redefs.
Don’t neglect example-based tests
As powerful as generative tests are, we shouldn’t suddenly turn our back on good
old example-based tests. Let’s consider some of the advantages of them to see
why:
A picture tells a thousand words
Automated tests don’t just help ensure correctness, they also act as
documentation. And as clojuredocs.org shows, good
documentation is greatly aided by a few concrete examples to help you really
grok what a function does.
Better test output
Unfortunately, even when using test.chuck’s version of for-all, we don’t get
the same detailed output for assertion failures that we would get when running
an example-based test. test.chuck allows us to use is, but we still don’t
get told exactly which assertion failed.
Example-based tests give much better output for the particulars of a given test
failure. Therefore when encountering a generative test failure it can be worth
creating an example-based test for that test, even just temporarily, in order to
better understand what’s going wrong.
Easier debugging
Similar to the previous point, example-based tests are much easier to debug
using interactive debuggers, logging and other traditional debugging techniques.
This is because they will usually be making a single call to your function under
test, or at least not very many.
In contrast, generative tests are impractical for interactive debugging due to
the large number of executions and unpredictable input, so this is another good
reason to create dedicated example-based tests for particular failure cases you
want to investigate.
Fast
“Everything is fast for fast for small n”. This is true up to a point; while
it’s certainly possible to create slow, bloated example-based tests, it’s
certainly easier to make them fast compared to generative tests. If nothing else
you have a bit more leeway to perform some IO or other slow operations within
your test, which gives you more flexibility.
Easier to write
If you find yourself agonizing too long over how to write the perfect generative
test for your function that would specify its behavior completely & elegantly,
then you may be better off just writing some example-based tests instead;
perhaps you can come back to the generative version later.
Don’t let perfect be the enemy of good, and don’t feel like you’re copping out
by writing example-based tests.
Section 4: Creating generators
One key challenge for creating generative tests is writing effective generators.
We’ve seen already we can get very far with creating generators just by
calling s/gen on a spec. This works for simple generators, but there are some
key things to know to avoid getting stuck.
We’ll start by looking at how to make the most of the generators that you can
create directly from specs, then look at a few more advanced techniques for
creating more specialized generators.
Key functions recognized by spec
The first potential obstacle to be aware of is that only certain functions &
predicates known by s/gen can be used for creating generators. We can create
a spec from any predicate, but any given predicate is opaque from the perspective
of spec unless spec “knows” about it.
For example, the following behave the same with respect to validation
(s/valid? boolean? true)
;;=> true

(s/valid? #(instance? Boolean %) true)
;;=> true
However, only the first one can be used to generate a spec using s/gen:
(gen/sample (s/gen boolean?))
;;=> (true true false true true false true false true false)

(gen/sample (s/gen #(instance? Boolean %)))
;;=> ERROR! ("Spec assertion failed")
Spec knows how to generate values for boolean?, but not for our second
predicate, even though they amount to the same thing.
Therefore if you happen to have some specs lying around that use idiosyncratic
predicates like this then you may have to rework them a little to use more
standard ones.
The generation-aware functions you can use are:
Spec-defining functions in clojure.spec.alpha, such as coll-of, keys,
int-in, inst-in, etc.
Clojure core predicates listed on the Clojure Cheat
Sheet under “Predicates with test.check generators”
Hash sets (i.e. standard Clojure sets, not sorted-set instances; the generator
just chooses a random element from the set)
Later on, we’ll look at using s/with-gen to provide spec with knowledge
of how to generate values for a predicate.
Using s/and & s/or
As well as allowing us to create compound predicates more succinctly, s/and &
s/or have the additional key aspect that they stop a predicate from becoming
opaque for s/gen.
For example, compare
(gen/sample (s/gen (s/and integer?
                          #(not= 1 %))))
;;=> (-1 -1 -1 -2 0 -1 0 0 0 4)
with
(gen/sample (s/gen (fn [x]
                     (and (integer? x)
                          (not= 1 x)))))
;;=> ERROR!
Again, in terms of validation these behave the same way, but for generation
purposes only the first one works. In the latter case spec only sees an
anonymous function; it can’t “peek” inside to see the use of integer?. In the
first case though the use of s/and allows spec to recognize this usage; it
will generate values using integer?, and internally filter those generated
using #(not= 1 %)
s/or behaves similarly:
(gen/sample (s/gen (s/or :integer integer?
                         :boolean boolean?)))
;;=> (-1 -1 false -2 true -1 true 0 false 4)
with
(gen/sample (s/gen (fn [x]
                     (or (integer? x)
                         (boolean? x)))))
;;=> ERROR!
Ordering of arguments in s/and
It’s worth highlighting something we just touched upon for s/and: the first
argument passed to s/and is used to determine the “base” generator function;
the rest are used to filter the generated results.
For example:
(gen/sample (s/gen (s/and integer?
                          #(not= 1 %))))
;;=> (-1 -1 -1 -2 0 -1 0 0 0 4)
with
(gen/sample (s/gen (s/and #(not= 1 %)
                          integer?)))
;;=> ERROR!
In the latter case, spec will look at #(not= 1 %) in order to generate values,
but not know what to do with it!
The limits of s/and’s internal filtering
When performing the internal filtering described above, generators created using
s/gen will only try a certain number of attempts to generate a value, but
after that they will give up. This means that if the secondary predicates
in an s/and form are too restrictive, then the generator may rarely work (or
not at all). In such cases, the chance of the internally generated
values passing the internal filtering is too low.
For example:
(gen/sample (s/gen (s/and integer?
                          #(<= 1000 %)
                          #(<= % 1020))))
;;=> Error: "Couldn't satisfy such-that predicate after 100 tries"
Internally, the created generator will generate integers in the same way as
(s/gen integer?) would, and then check if it’s between 1000 & 1020. The chance
of this is low enough that after 100 tries the generator fails to generate such
a value most of the time.
This means that if our secondary predicates within our s/and are doing more
than excluding exceptional cases, then we need to take advantage of other
techniques to create valid generators.
Using spec ns functions to create generator-friendly specs
A good rule of thumb to avoid creating failing generators is to favor usage
of functions within the clojure.spec.alpha namespace where possible.
For example, the earlier example would be better written using s/int-in:
(gen/sample (s/gen (s/int-in 1000 1021)))
;;=> (1001 1001 1002 1001 1000 1001 1006 1002 1002 1002)
For spec functions like this, rather than internally generating a value only to
potentially throw it away due to predicates, spec can instead choose values more
intelligently and so can guarantee that a value is generated on each attempt.
Another key way of making better generators is by taking advantage of the
optional keyword arguments you can pass to s/coll-of. It’s a function you’re
almost certainly already using when making specs, but you might not be aware of
the following keys you can use:
:kind, for specifying the collection type
:count, for specifying a collection which must have an exact number of
elements
:min-count & max-count, for (you guessed it) specifying a minimum and/or
maxiumum number of elements for the collection
distinct, for ensuring that all elements in the collection are distinct
gen-max, for specifying a particular maximum amount of elements to be generated.
gen-max in particular is good to be aware of; by default, gen-max is 20,
meaning your collections by default will contain at most 20 elements. This
default is in place presumably to avoid a problem where deeply nested
collections can contain an unmanageable number of
elements.
However your test needs to use larger collections in order to be meaningful,
then you will need to increase this value.
(See the docstring for s/every for more details.)
For example, suppose we want a spec for
A collection of ints in the range 1000 to 1020 (inclusive)
Min collection size 10
All elements distinct
The generator will almost certainly fail if we use predicate functions for this:
(gen/sample
 (s/gen
  (s/and (s/coll-of (s/int-in 1000 1021))
         #(<= 10 (count %))
         #(apply distinct? %))))
;;=> Error!
But the generation will work just fine when making use of the equivalent
arguments to s/coll-of:
(gen/sample
 (s/gen
  (s/coll-of (s/int-in 1000 1021)
             :min-count 10
             :distinct true)))
;;=> ([1016 1010 1003 1000 1013...
Creating new generators with s/with-gen & gen/fmap
As we’ve seen, we can get very far simply by composing together predicates and
various spec functions. Sooner or later though we’ll come across data that
requires us to write our own generators.
Thankfully though, we rarely (if ever) need to write a generator from scratch.
The two main tools we’ll use for creating generators are:
s/with-gen for declaring a spec with an accompanying generator
gen/fmap for creating a generator based on another generator, which simply
transforms each value generated
For example, suppose we want to create a spec & generator for strings which are
valid UUIDs. If we simply use s/and here then, as we’ve seen many times now,
we’ll have a valid spec but a broken generator:
(gen/sample (s/gen (s/and string?
                          parse-uuid)))
;;=> Error: "Couldn't satisfy such-that predicate after 100 tries"
Recall that because of s/and, internally the generator will generate values
based upon string? and then filter values for which parse-uuid returns a
truthy value (i.e. valid UUID strings). This doesn’t work because the chance of
any random string being a valid UUID is extremely small. Nor are there any
alternate spec predicate functions or variants we can take advantage of.
In this situation we can reach for s/with-gen and gen/fmap, like so:
(s/def ::uuid-str
  (s/with-gen
    (s/and string? parse-uuid)
    #(gen/fmap str (s/gen uuid?))))

(gen/generate (s/gen ::uuid-str))
;;=> "989e05e3-59ae-4e70-939f-ceda36f70cfd"

(s/valid? ::uuid-str "989e05e3-59ae-4e70-939f-ceda36f70cfd")
;;=> true
Here we declare a spec with its name in the usual way using s/def, but we wrap
our spec definition with s/with-gen. This function takes two arguments: the
predicate which makes up our spec, and a no-argument function which returns a
generator when called. (Note the # used to place our generator definition
within a parameterless function literal.) Our specified generator will now be
returned whenever s/gen is called on our spec.
gen/fmap simply returns a new generator which uses the generator it’s given to
generate values, then applies the passed function f to each generated
value. So here our new generator will generate java.util.UUIDs and then
convert each one to a string using str.
Using gen/fmap for better performance
We don’t necessarily need to wait until our generator fails to generate values
before reaching for gen/fmap (or indeed, the other techniques we’ve looked at
so far); we can benefit from creating our own generators simply to get better
performance for our tests.
While the standard wisdom of avoiding premature optimization still generally
applies for generative testing, I’d argue that it’s worth prioritizing
performance concerns a little higher than you would normally because of our need
to run so many iterations. As you add more and more generative tests like that
you may find your test suite slowing down, in which case your tests, specs
and/or generators may need a bit of TLC.
For example, suppose we have created a simple spec for integers that are a
multiple of 10, using s/and:
(s/def ::multiples-of-10
  (s/coll-of (s/and integer?
                    #(= 0 (rem % 10)))
             :gen-max 200))
This works well enough, but performance could be better:
(time
 (do
   (doall
    (gen/sample (s/gen ::multiples-of-10)
                200))
   nil))
;;=> ~500ms
Recalling how s/and affects generation, notice that for multiples-of-10’s
generator the #(= 0 (rem % 10)) predicate acts a filter for the integers
generated by integer?; a lot of results are being thrown away, which has a
performance cost.
We can avoid wasting these generation iterations by ensuring that every raw
generated value satisfies the spec: We can generate an integer, then multiply
that integer by 10 to ensure that it is indeed a multiple of 10, using fmap:
(time
 (do
   (doall
    (gen/sample
     (gen/fmap
      (fn [coll]
        (mapv #(* 10 %) coll))
      (s/gen (s/coll-of
              (s/int-in
               (/ Long/MIN_VALUE 10)
               (/ Long/MAX_VALUE 10))
              :gen-max 200)))
              200))
   nil))
;;=> ~100ms
(time
 (do
   (doall
    (gen/sample
     (gen/fmap (fn [coll] (mapv #(* 10 %) coll))
               (s/gen (s/coll-of (s/int-in (/ Long/MIN_VALUE 10)
                                           (/ Long/MAX_VALUE 10))
                                 :gen-max 200)))
     200))
   nil))
;;=> ~100ms
This version is noticeably faster since we’re no longer wasting cycles throwing
away generated values.
Here we explicitly create our “inner” generator in the second argument to
fmap; this simply generates raw integers, but with a limited range so that we
don’t get integer overflows when the mapping function multiplies by 10.
Since we’ve made a more performant generator, then we may as well use
s/with-gen to redefine our spec to use it as its generator:
(s/def ::multiples-of-10
  (s/with-gen
    (s/coll-of (s/and integer?
                      #(= 0 (rem % 10))))
    (fn []
      (gen/fmap
       (fn [coll]
         (mapv #(* 10 %) coll))
       (s/gen
        (s/coll-of
         (s/int-in
          (/ Long/MIN_VALUE 10)
          (/ Long/MAX_VALUE 10))
         :gen-max 200))))))
(s/def ::multiples-of-10
  (s/with-gen
    (s/coll-of (s/and integer?
                      #(= 0 (rem % 10))))
    (fn []
      (gen/fmap (fn [coll] (mapv #(* 10 %) coll))
                (s/gen (s/coll-of (s/int-in (/ Long/MIN_VALUE 10)
                                            (/ Long/MAX_VALUE 10))
                                  :gen-max 200))))))
Spec generator quirks & gotchas
There are some quirks about certain spec generators that are good to know about;
in some cases you may wish to reach for creating a generator from the
clojure.spec.gen.alpha namespace.
(s/gen string?) only generates alphanumeric characters
(s/gen string?) will never give you characters outside of the letters and
numbers. This is a real shame because it’s when you introduce other characters
that you can really help expose bugs in your string-processing functions.
In contrast, consider using gen/string-ascii, which generates from the ASCII
range of characters, or gen/string, which can generate even unprintable
characters.
The default maximum size for s/coll-of generators is 20.
As we noted
earlier, by
default generators created from an s/coll-of spec have a gen-max option which
defaults to 20. Be sure to increase this if handling larger collection sizes is
key for your test.
s/coll-of VS gen/vector sizing behavior
These two generator types use the size parameter differently:
s/coll-of uses size to scale the contained elements, but the collection size
is completely random
gen/vector uses size to determine the size of the collection
For example:
(gen/sample (s/gen (s/coll-of integer?)) 3)
;; => ([-1 -1 -1 -1 -1 -1 0 0 -1 0 -1 0 -1 0 0 0 0 -1 0 -1]
;;     [-1 0 -1 -1 -1 -1 -1 -1 0 0]
;;     [0 0 0 -1 0 1 -1 0 -1 -1 0 -1])

(gen/sample (gen/vector (gen/large-integer)) 3)
;; => ([] [] [1 1])
This means that using gen/vector may be preferable if your function under test
is more sensitive to the size of the collection than the contained elements.
Using ad-hoc generated data in example-based tests
After getting so comfortable with making generators you may be wondering about
using them to help you generate data for your example-based tests. You
can do this, however you should be very careful if you do so! Remember that
the key features provided by test.check are crucial for creating tests that
are reliable and useful. I often see example-based tests with some or all of the
inputs based on direct calls to gen/generate, presumably to avoid the drudgery
of hand-crafting example inputs for large input maps. Sadly though such tests
are often flaky.
If you simply want to avoid hand-crafting data for example-based tests, use one
of the following techniques to use ad-hoc data generation safely:
save your generated test data: you can call the generator in a REPL
session, and then save the output into your test file
generate using a seed: use the clojure.test.check.generators namespace
directly (as opposed to the lazily-loaded clojure.spec.gen.alpha version)
using the 3-parameter version of generate which allows you to specify a
seed. This way each test run will be completely consistent. Useful for
extremely large input collections which you don’t want to have to save
as a literal in your test file.
Conclusion
test.check is a generative testing library with great integration with spec.
Assuming your code is making use of spec already, then you’re already halfway
there to creating some powerful tests that can help you to expose edge-cases
that you might not have easily found otherwise.
The main challenges when creating generative tests are thinking of good general
properties to check and creating reliable & efficient generators. Getting
comfortable with these skills will reward you with the ability to get far more
reliable unit tests than with example-based tests alone.

A Practical Guide to test.check

Section 1: Quick Start

What is `test.check`?

Your first generative test

Formatting & linting

Vanilla vs `test.chuck`’s `for-all`

Section 2: Key Features

The power of Shrinking

Multiple iterations per test run

Size matters

Sizing in action

Size is abstract

Some generators ignore size

Increasing size of test runs

Section 3: Tips for writing good generative tests

The bare minimum: exposing exceptions

Improving input & output specs with instrumentation

Finding general properties to check

Each iteration needs to be fast

Don’t neglect example-based tests

A picture tells a thousand words

Better test output

Easier debugging

Fast

Easier to write

Section 4: Creating generators

Key functions recognized by spec

Using `s/and` & `s/or`

Ordering of arguments in `s/and`

The limits of `s/and`’s internal filtering

Using spec ns functions to create generator-friendly specs

Creating new generators with `s/with-gen` & `gen/fmap`

Using `gen/fmap` for better performance

Spec generator quirks & gotchas

`(s/gen string?)` only generates alphanumeric characters

The default maximum size for `s/coll-of` generators is 20.

`s/coll-of` VS `gen/vector` sizing behavior

Using ad-hoc generated data in example-based tests

Conclusion

A Practical Guide to test.check

Section 1: Quick Start

What is test.check?

Your first generative test

Formatting & linting

Vanilla vs test.chuck’s for-all

Section 2: Key Features

The power of Shrinking

Multiple iterations per test run

Size matters

Sizing in action

Size is abstract

Some generators ignore size

Increasing size of test runs

Section 3: Tips for writing good generative tests

The bare minimum: exposing exceptions

Improving input & output specs with instrumentation

Finding general properties to check

Each iteration needs to be fast

Don’t neglect example-based tests

A picture tells a thousand words

Better test output

Easier debugging

Fast

Easier to write

Section 4: Creating generators

Key functions recognized by spec

Using s/and & s/or

Ordering of arguments in s/and

The limits of s/and’s internal filtering

Using spec ns functions to create generator-friendly specs

Creating new generators with s/with-gen & gen/fmap

Using gen/fmap for better performance

Spec generator quirks & gotchas

(s/gen string?) only generates alphanumeric characters

The default maximum size for s/coll-of generators is 20.

s/coll-of VS gen/vector sizing behavior

Using ad-hoc generated data in example-based tests

Conclusion

What is `test.check`?

Vanilla vs `test.chuck`’s `for-all`

Using `s/and` & `s/or`

Ordering of arguments in `s/and`

The limits of `s/and`’s internal filtering

Creating new generators with `s/with-gen` & `gen/fmap`

Using `gen/fmap` for better performance

`(s/gen string?)` only generates alphanumeric characters

The default maximum size for `s/coll-of` generators is 20.

`s/coll-of` VS `gen/vector` sizing behavior