A Practical Guide to test.check

Here’s a pragmatic guide to generative testing in Clojure using test.check, oriented around spec.

One of spec’s main selling points is that it can be used for validation, instrumentation and generative testing, but in practice I don’t see very many codebases using spec also taking advantage of its integration with test.check. So if you’re already using spec - or at least familiar with it - then this guide is for you. But even if not, hopefully this guide will still give you a much better understanding of test.check and generative testing in Clojure.

Table of Contents

Section 1: Quick Start

What is test.check?

In short, test.check is a Clojure library for performing generative tests. Generative tests - also known as property-based tests - are tests that run our code under test against random inputs, then check general properties that should apply across the range of possible randomly generated data. In contrast, we refer to traditional unit tests that operate on fixed inputs as example-based tests.

Your first generative test

To whet your appetite, let’s start with a simple example.

First off, you’ll need the following dependencies:

;; Main test.check dependency
org.clojure/test.check
{:mvn/version "1.1.1"}

;; Optional; extra tools, and (IMO)
;; better clojure.test integration
com.gfredericks/test.chuck
{:mvn/version "0.2.13"}

Then we can create the following failing test:

(require
 '[clojure.spec.alpha :as s]
 '[clojure.test
   :refer [is testing]]
 '[clojure.test.check.clojure-test
   :refer [defspec]]
 '[com.gfredericks.test.chuck.clojure-test
   :as chuck])

(defn broken-sort [coll]
  (if (some #{13} coll)
    nil
    (sort coll)))

(defspec broken-sort-gen-test
  (chuck/for-all [input-coll
                  (s/gen (s/coll-of int?))]
    (let [output-coll
          (broken-sort input-coll)]
      (testing
       "Result is in ascending order"
        (when (seq input-coll)
          (is (apply <= output-coll))))
      (testing
       "The sorted collection contains the same elements"
        (is (= (group-by identity
                         input-coll)
               (group-by identity
                         output-coll)))))))

(broken-sort-gen-test)
; => Fails! (95% of the time...)

The key things here are:

  • We define a test.check unit test using defspec. (NB the naming is a bit unfortunate - this has nothing to do with clojure.spec!)
  • We can create generators from specs using s/gen; we create a spec for a collection of integers on the fly with (s/coll-of int?), then grab a test data generator for it using s/gen
  • The for-all binding vector binds the generated value for each test run; the assertions are run every time a new test value is generated, and by default 100 test values are generated per test run

In your REPL try modifying broken-sort; you should find that the only implementation that consistently passes the test is a call to sort . Hopefully you can see already how powerful generative testing is; we can really nail down the correctness of the behavior we want with just a single test!

Formatting & linting

Out of the box, the macros we use here don’t play very nicely with cljfmt or clj-kondo. Thankfully though this is easily fixed with a couple of config files:

.cljfmt.edn:

{:extra-indents {for-all [[:inner 0]]}}

.clj-kondo/config.edn:

{:lint-as
 {clojure.test.check.clojure-test/defspec
  clojure.test/deftest

  com.gfredericks.test.chuck.clojure-test/for-all
  clojure.test.check.properties/for-all}}

Vanilla vs test.chuck’s for-all

In the above example I’ve used test.chuck’s version of for-all. In short, this is because the vanilla version of for-all doesn’t play nicely with using is assertions in tests; instead, test.check’s built-in for-all macro uses the overall truthiness of the body expression.

So, the test above rewritten to use the vanilla for-all would look like:

(require
 '[clojure.test.check.properties
   :as p])

(defspec broken-sort-gen-test-vanilla-for-all
  (p/for-all [input-coll
              (s/gen (s/coll-of int?))]
    (let [output-coll
          (broken-sort input-coll)]
      (and
       (testing
        "Result is in ascending order"
         (or (empty? input-coll)
             (and (seq output-coll)
                  (apply <= output-coll))))
       (testing
        "The sorted collection contains the same elements"
         (= (group-by identity
                      input-coll)
            (group-by identity
                      output-coll)))))))

In particular, note that we need to group both our assertions within an and form. This isn’t too bad on its own, but since we’re trying to integrate with clojure.test here (by using defspec), it makes sense to me to prefer an approach that lets us write tests more consistently.

Therefore from now on I’ll only use the test.chuck version of for-all. I’ll always explicitly prefix it with the ns alias chuck for the sake of clarity, but in your own code you’ll probably want to refer whichever version of for-all you choose.

Section 2: Key Features

Usually when unit testing we are rightfully wary of using randomly-generated data, since when used naively it can lead to flaky tests and hard-to-diagnose failures. Proper generative testing differs from naive usage of random data by ensuring deterministic behavior and ease of failure analysis.

Test.check provides this capability for reliable generative tests through the following key features:

  • Intelligently calculates the simplest failing inputs - test.check calls this shrinking
  • Runs multiple iterations per test run - this helps prevent flakiness
  • Intelligently generates data - “simpler” data is generated first

The power of Shrinking

One of the most powerful features of test.check is shrinking, where rather than simply spitting out the first failure it finds, it does some further work to automagically find a simpler failing test input for us.

To see this in action, let’s take a look at the failing test output for our earlier broken-sort-gen-test example:

{:shrunk
 {:total-nodes-visited 11,
  :depth 3,
  :pass? false,
  :result false,
  :result-data nil,
  :time-shrinking-ms 1,
  :smallest [{input-coll [13]}]},
 :failed-after-ms 5,
 :num-tests 13,
 :seed 1700759647930,
 :fail [{input-coll [15 36 -29 93 2 13 -756 -649 360 2]}],
 :result false,
 :result-data nil,
 :failing-size 12,
 :pass? false}

Under the :fail key we have [{input-coll [15 36 -29 93 2 13 -756 -649 360 2]}], which shows us that this particular test run failed with an input collection of [15 36 -29 93 2 13 -756 -649 360 2]. However, test.check automatically does some further digging for us to find that the simplest input collection that fails is [13].

This is very powerful! This result might feel obvious given how contrived our example is, but in more realistic circumstances this capability is very useful indeed:

  • We’ve generated a failing test case; even before we take any shrinking into account, right off the bat we’ve found a failure that we might not have found through traditional example-based testing.
  • This tiny collection is much easier to debug than the initial failure
  • We can infer further information from the initial & shrunk data:
    • Given that the simplified collection doesn’t contain the numbers in the initial failure run, then we can infer that the other elements are most likely irrelevant to the test failure
    • Since [] is simpler than [13], but test.check didn’t shrink all the way down to [], we know that an empty vector would pass.

Multiple iterations per test run

One of the key parts of generative testing is avoiding the potential flakiness we might introduce through naive usage of random data. One of the simpler ways that test.check achieves this is by running multiple iterations per test run.

The key things to know here are:

  • For tests created using defspec, when not specified the number of iterations is the value of clojure.test.check.clojure-test/*default-test-count*, which defaults to 100.
  • Otherwise, you can manually specify the number of iterations you want per test, like so:
(defspec my-test 200

When creating generative tests you might discover some inconsistent failures; if so, one thing you can try to make the failures happen with more regularity is to try upping the number of iterations. However, there’s only so much juice you can squeeze out of this before you make your test runs impractically slowly. So if upping the iterations isn’t enough on its own, you probably need to understand test.check’s concept of sizing for its generators

Size matters

A key feature of test.check’s data generators is that they each accept a size parameter that places bounds on the resulting data. Test.check uses this to generate smaller (or rather, simpler) data towards the start of a test run, then uses larger and larger size bounds as the test run goes on. This helps to make test runs more consistent and, in concert with shrinking, makes failure cases simpler.

In practice, you will rarely if ever use this size parameter directly; as mentioned above, it’s mostly used behind the scenes by test.check.

Sizing in action

We can experiment with size directly by using the generate function from the ns clojure.test.check.generators, which optionally accepts a size parameter.

To demonstrate this more easily though, we can create a small helper function, like so:

(require
 '[clojure.test.check.generators
   :as tgen])

(defn exercise-sizes
  ([generator sizes]
   (exercise-sizes generator sizes 100000))
  ([generator sizes num-iterations]
   (->>
    sizes
    (map
     (fn [size]
       [size
        (->> (repeatedly
              num-iterations
              #(tgen/generate generator
                              size))
             (into (sorted-set)))]))
    (into (sorted-map)))))

(exercise-sizes (s/gen integer?) (range 5))
;;=>
{0 #{-1 0}
 1 #{-1 0}
 2 #{-2 -1 0 1}
 3 #{-4 -3 -2 -1 0 1 2 3}
 4 #{-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7}}

(Note: we’re using test.check’s generators namespace directly here, aliased as tgen, since it contains some extra arity versions of functions that sadly aren’t included in the usual clojure.spec.gen.alpha namespace.

We generally prefer the latter because it lazy-loads the generator functionality, which then lets us specify our generators alongside our specs in our production code, but without needing to include the test.check dependency in a production build.)

Size is abstract

First of all, we can observe that the size parameter is abstract; i.e. although larger size values generally result in a wider range of inputs, it’s not as simple as max value = size, as we can see in the ranges of values produced by (s/gen integer?) above.

Some generators ignore size

Some generators take no notice of sizing and so just generate values within the same range regardless.

A good example of this is (s/gen uuid?), which generates UUIDs completely randomly in the usual way regardless of sizing:

(exercise-sizes (s/gen uuid?) (range 3) 2)
;;=>
{0
 #{#uuid "dfca3a57-3100-4c5c-8c90-5dbaa93d859b"
   #uuid "212df58f-eb0d-487b-8225-1c61ad56e6d8"}
 1
 #{#uuid "eeb9d663-fb88-45ee-a4ba-9b184720425a"
   #uuid "185f2d24-88a7-43d4-9b9b-b715d3ebe0ae"}
 2
 #{#uuid "d837eafb-0803-4a99-847a-e4533a8db643"
   #uuid "fc3bce75-6684-4438-a8b1-9ae589aa52e3"}

A less obvious example is (s/gen boolean?), which generates a true or false value equally likely regardless of size:

(exercise-sizes (s/gen boolean?) (range 5))
;;=>
{0 #{false true}
 1 #{false true}
 2 #{false true}
 3 #{false true}
 4 #{false true}}

Increasing size of test runs

For test runs, the size used for each iteration cycles from 0 through to 199, then cycles back to 0 again.

i.e.

  • iteration 0: size 0
  • iteration 1: size 1
  • iteration 2: size 2
  • iteration 199: size 199
  • iteration 200: size 0
  • iteration 201: size 1

In particular, this means that it’s a good idea to specify your number of iterations as 200 or more, since otherwise you don’t cover as big a range of possible data; remember that defspec defaults to 100 iterations if you don’t specify it. (This is an odd choice by the defspec implementation IMO, especially since the underlying raw test.check test functions default to 200.)

We can see this in action by making use of the sample helper function, which behaves in the same way.

(require '[clojure.spec.gen.alpha :as gen])

(gen/sample (s/gen integer?))
;;=> (0 -1 0 -1 -1 1 -7 -18 -5 -112)

We can see the deeper in the sequence the values appear, the more likely it is to be a larger value.

To make this a bit more obvious, let’s specify a larger number of values to generate:

(gen/sample (s/gen integer?) 20)
;;=> (-1 -1 0 -4 0 -7 -4 15 -1 17 -479 440 -15 -1615 64 -2 213 924 -1343 11157)

Section 3: Tips for writing good generative tests

Now that we’re well-equipped to create reliable, performant generators, let’s take a look at how to write valuable test.check tests using them.

The bare minimum: exposing exceptions

The good news is that even if we can’t think of some good general properties to check, at the very least we can verify that our functions get as far as returning something without blowing up with an exception.

For example, suppose we are testing a function like this speed calculation function:

(defn speed [distance time]
  (/ distance time))

The this is the sort of function where the real meat of the logic is arguably better demonstrated through traditional example tests. However, we can chuck some numbers through it to expose the divide-by-zero error:

(defspec speed-gen-test 200
  (chuck/for-all
      [[distance time]
       (s/gen (s/tuple double? double?))]
    (speed distance time)
    (is true
        "Hey, at least we didn't blow up!")))

(speed-gen-test)
;;=> java.lang.ArithmeticException: Divide by zero
...
:smallest [{distance 1.0, time 0.0}]
...

This shows that our function’s spec needs to be improved; we need to exclude zero:

(s/def ::non-zero-double
  (s/and double?
         (complement zero?)))

(defn speed [distance time]
  (/ distance time))

(defspec speed-gen-test 200
  (chuck/for-all
      [[distance time]
       (s/gen (s/tuple double?
                       ::non-zero-double))]
    (speed distance time)
    (is true
        "Hey, at least we didn't blow up!")))

(speed-gen-test)
;;=> OK

Improving input & output specs with instrumentation

We can improve our generative tests indirectly by making use of instrumentation - i.e. automatic validation of function inputs against specs. This helps us to discover functions whose input specs are too strict. Then as we widen the scope of our specs, our corresponding generative tests will cover more ground.

For example, suppose we were trying to spec out the sort function, using almost the same generative test as for our initial broken-sort example. (However, we’ll create a duplicate my-sort function in order not to break the real sort, since doing so would break our REPL in surprising ways!)

(s/def ::sortable
  (s/coll-of integer?))

(defn my-sort [coll]
  (sort coll))

(s/fdef my-sort
  :args (s/cat :coll ::sortable))

(my-sort [2 1])
;;=> (1 2)

(defspec broken-sort-gen-test
  (chuck/for-all [input-coll
                  (s/gen ::sortable)]
    (let [output-coll
          (broken-sort input-coll)]
      (testing "Result is in ascending order"
        (when (seq input-coll)
          (is (apply <= output-coll))))
      (testing (str "The sorted collection "
                    "contains the same elements")
        (is (= (group-by identity input-coll)
               (group-by identity output-coll)))))))

(my-sort-gen-test) ;;=> [passes]

(my-sort [1.2 1.1])
;;=> Execption: Spec assertion failed!

Here we’ve naively assumed we’re only sorting integers, but our instrumentation has helped us find that this assumption was incorrect. (In practice, the instrumentation would more likely be exposing issues during integration testing or ad-hoc manual tests rather than REPL interaction - but you get the idea.)

So, let’s expand our ::sortable spec a little - why not expand it out to any valid number? This gets past our instrumentation, but our generative test now fails:

(s/def ::sortable
  (s/coll-of number?))

(my-sort [2 1])
;;=> (1 2)

(my-sort [1.2 1.1])
;;=> (1.1 1.2)

(my-sort-gen-test)
;;=> {:smallest [##NaN]}

Our generative test has shown us that our new ::sortable spec, meant to include all numbers, also (unintuitively) includes ##NaN (Not a Number). But due to quirks of ##NaN, it doesn’t play well with sorting. And sure enough, the Clojure comparators guide recommends removing all occurrences of ##NaN before sorting a collection.

Therefore it makes sense for our ::sortable spec to exclude ##NaN. With this done, our generative test passes again:

(s/def ::sortable
  (s/coll-of (s/and number?
                    (complement NaN?))))

(my-sort-gen-test)
;;=> [passes]

If we were depending on example-based tests alone it could have been all too easy to just add or update tests that try some plain doubles, while forgetting about ##NaN.

The key thing here is that instrumentation works well in concert with generative tests to help us get our specs just right; instrumentation helps expose places where are specs are too strict, while generative tests help us ensure that we don’t expand our specs out too widely in response.

Finding general properties to check

One of the main challenges of writing generative tests is finding good general properties to test in your tests, without needing to write a test sophisticated enough that it pretty much duplicates the logic of your code under test.

In most cases we can’t nail down the functionality as completely as we can with something like a sort function. But we can still get quite a lot of coverage by testing the following things:

  1. Properties about the input
  2. Properties about the output
  3. Properties about how the output relates to the input

We get (1) from instrumentation, so we’re going to focus here on (2) & (3).

For example, suppose we were writing a generative test for the camel-snake-kebab library’s ->kebab-case function. In case you’re not already aware, this converts variously-cased strings to kebab case, like so:

(require '[camel-snake-kebab.core :as csk])
(csk/->kebab-case "fooBar") ;;=> "foo-bar"
(csk/->kebab-case "Foo_Bar") ;;=> "foo-bar"

We can’t write a test that completely describes the expected outputs given a generalized input without duplicating the logic implemented by ->kebab-case in the first place. But we can at least pin down some general properties, even if they don’t completely describe the expected behavior.

Let’s start with the following:

  • a) Result contains no uppercase characters or underscores
  • b) Ordering of letters is preserved

Which we can implement as a generative test like so:

(defspec kebab-case-gen-test 200
  (chuck/for-all [input
                  (gen/string)]
    (let [output
          (csk/->kebab-case input)]
      (testing
       "Result contains no uppercase characters or underscores"
        (is (re-matches #"[^A-Z_]*" output)))
      (testing
       "Letter ordering is preserved"
        (letfn [(lower-and-strip [s]
                  (-> (str/lower-case s)
                      (str/replace #"[-_]"
                                   "")))]
          (is (= (lower-and-strip input)
                 (lower-and-strip output))))))))

However, this test fails, with a simplest failing input of " " (space). (You may notice that we’re using gen/string here as our generator rather than (s/gen string?); this is because the latter only generates alphanumeric characters.)

Let’s try this input out in the REPL to see what the problem is:

(csk/->kebab-case " ") ;;=> ""

Aha! We’ve stumbled across the fact that ->kebab-case also strips whitespace:

(csk/->kebab-case "foo  bar") ;;=> "foo-bar"

Let’s update our test to reflect this:

(defspec kebab-case-gen-test 200
  (chuck/for-all [input
                  (gen/string)]
    (let [output
          (csk/->kebab-case input)]
      (testing
       (str "Result contains no uppercase "
            "characters, underscores or whitespace")
        (is (re-matches #"[^A-Z_\s]*"
                        output)))
      (testing "Letter ordering is preserved"
        (letfn [(lower-and-strip [s]
                  (-> (str/lower-case s)
                      (str/replace #"[-_\s]"
                                   "")))]
          (is (= (lower-and-strip input)
                 (lower-and-strip output))))))))

This passes - great!

What this shows is that generative testing helps us find test cases that we might not have thought of otherwise. We could have written plenty of example-based tests where we didn’t take into account whitespace, but using test.check has helped to expose this behavior.

Each iteration needs to be fast

One of the main restrictions to generative testing is that you only get the most out of them if each iteration is lightning-quick. Too slow, and you’re forced to choose between either bloating your test suite’s run time, or dropping the number of data generation iterations, weakening the test’s reliability.

Ideally function under test should be a fast, pure function; failing that, you’ll need to stub out any slow operations such as I/O through techniques such as parameterization or using with-redefs.

Don’t neglect example-based tests

As powerful as generative tests are, we shouldn’t suddenly turn our back on good old example-based tests. Let’s consider some of the advantages of them to see why:

A picture tells a thousand words

Automated tests don’t just help ensure correctness, they also act as documentation. And as clojuredocs.org shows, good documentation is greatly aided by a few concrete examples to help you really grok what a function does.

Better test output

Unfortunately, even when using test.chuck’s version of for-all, we don’t get the same detailed output for assertion failures that we would get when running an example-based test. test.chuck allows us to use is, but we still don’t get told exactly which assertion failed.

Example-based tests give much better output for the particulars of a given test failure. Therefore when encountering a generative test failure it can be worth creating an example-based test for that test, even just temporarily, in order to better understand what’s going wrong.

Easier debugging

Similar to the previous point, example-based tests are much easier to debug using interactive debuggers, logging and other traditional debugging techniques. This is because they will usually be making a single call to your function under test, or at least not very many.

In contrast, generative tests are impractical for interactive debugging due to the large number of executions and unpredictable input, so this is another good reason to create dedicated example-based tests for particular failure cases you want to investigate.

Fast

“Everything is fast for fast for small n”. This is true up to a point; while it’s certainly possible to create slow, bloated example-based tests, it’s certainly easier to make them fast compared to generative tests. If nothing else you have a bit more leeway to perform some IO or other slow operations within your test, which gives you more flexibility.

Easier to write

If you find yourself agonizing too long over how to write the perfect generative test for your function that would specify its behavior completely & elegantly, then you may be better off just writing some example-based tests instead; perhaps you can come back to the generative version later.

Don’t let perfect be the enemy of good, and don’t feel like you’re copping out by writing example-based tests.

Section 4: Creating generators

One key challenge for creating generative tests is writing effective generators. We’ve seen already we can get very far with creating generators just by calling s/gen on a spec. This works for simple generators, but there are some key things to know to avoid getting stuck.

We’ll start by looking at how to make the most of the generators that you can create directly from specs, then look at a few more advanced techniques for creating more specialized generators.

Key functions recognized by spec

The first potential obstacle to be aware of is that only certain functions & predicates known by s/gen can be used for creating generators. We can create a spec from any predicate, but any given predicate is opaque from the perspective of spec unless spec “knows” about it.

For example, the following behave the same with respect to validation

(s/valid? boolean? true)
;;=> true

(s/valid? #(instance? Boolean %) true)
;;=> true

However, only the first one can be used to generate a spec using s/gen:

(gen/sample (s/gen boolean?))
;;=> (true true false true true false true false true false)

(gen/sample (s/gen #(instance? Boolean %)))
;;=> ERROR! ("Spec assertion failed")

Spec knows how to generate values for boolean?, but not for our second predicate, even though they amount to the same thing.

Therefore if you happen to have some specs lying around that use idiosyncratic predicates like this then you may have to rework them a little to use more standard ones.

The generation-aware functions you can use are:

  • Spec-defining functions in clojure.spec.alpha, such as coll-of, keys, int-in, inst-in, etc.
  • Clojure core predicates listed on the Clojure Cheat Sheet under “Predicates with test.check generators”
  • Hash sets (i.e. standard Clojure sets, not sorted-set instances; the generator just chooses a random element from the set)

Later on, we’ll look at using s/with-gen to provide spec with knowledge of how to generate values for a predicate.

Using s/and & s/or

As well as allowing us to create compound predicates more succinctly, s/and & s/or have the additional key aspect that they stop a predicate from becoming opaque for s/gen.

For example, compare

(gen/sample (s/gen (s/and integer?
                          #(not= 1 %))))
;;=> (-1 -1 -1 -2 0 -1 0 0 0 4)

with

(gen/sample (s/gen (fn [x]
                     (and (integer? x)
                          (not= 1 x)))))
;;=> ERROR!

Again, in terms of validation these behave the same way, but for generation purposes only the first one works. In the latter case spec only sees an anonymous function; it can’t “peek” inside to see the use of integer?. In the first case though the use of s/and allows spec to recognize this usage; it will generate values using integer?, and internally filter those generated using #(not= 1 %)

s/or behaves similarly:

(gen/sample (s/gen (s/or :integer integer?
                         :boolean boolean?)))
;;=> (-1 -1 false -2 true -1 true 0 false 4)

with

(gen/sample (s/gen (fn [x]
                     (or (integer? x)
                         (boolean? x)))))
;;=> ERROR!

Ordering of arguments in s/and

It’s worth highlighting something we just touched upon for s/and: the first argument passed to s/and is used to determine the “base” generator function; the rest are used to filter the generated results.

For example:

(gen/sample (s/gen (s/and integer?
                          #(not= 1 %))))
;;=> (-1 -1 -1 -2 0 -1 0 0 0 4)

with

(gen/sample (s/gen (s/and #(not= 1 %)
                          integer?)))
;;=> ERROR!

In the latter case, spec will look at #(not= 1 %) in order to generate values, but not know what to do with it!

The limits of s/and’s internal filtering

When performing the internal filtering described above, generators created using s/gen will only try a certain number of attempts to generate a value, but after that they will give up. This means that if the secondary predicates in an s/and form are too restrictive, then the generator may rarely work (or not at all). In such cases, the chance of the internally generated values passing the internal filtering is too low.

For example:

(gen/sample (s/gen (s/and integer?
                          #(<= 1000 %)
                          #(<= % 1020))))
;;=> Error: "Couldn't satisfy such-that predicate after 100 tries"

Internally, the created generator will generate integers in the same way as (s/gen integer?) would, and then check if it’s between 1000 & 1020. The chance of this is low enough that after 100 tries the generator fails to generate such a value most of the time.

This means that if our secondary predicates within our s/and are doing more than excluding exceptional cases, then we need to take advantage of other techniques to create valid generators.

Using spec ns functions to create generator-friendly specs

A good rule of thumb to avoid creating failing generators is to favor usage of functions within the clojure.spec.alpha namespace where possible.

For example, the earlier example would be better written using s/int-in:

(gen/sample (s/gen (s/int-in 1000 1021)))
;;=> (1001 1001 1002 1001 1000 1001 1006 1002 1002 1002)

For spec functions like this, rather than internally generating a value only to potentially throw it away due to predicates, spec can instead choose values more intelligently and so can guarantee that a value is generated on each attempt.

Another key way of making better generators is by taking advantage of the optional keyword arguments you can pass to s/coll-of. It’s a function you’re almost certainly already using when making specs, but you might not be aware of the following keys you can use:

  • :kind, for specifying the collection type
  • :count, for specifying a collection which must have an exact number of elements
  • :min-count & max-count, for (you guessed it) specifying a minimum and/or maxiumum number of elements for the collection
  • distinct, for ensuring that all elements in the collection are distinct
  • gen-max, for specifying a particular maximum amount of elements to be generated.

gen-max in particular is good to be aware of; by default, gen-max is 20, meaning your collections by default will contain at most 20 elements. This default is in place presumably to avoid a problem where deeply nested collections can contain an unmanageable number of elements. However your test needs to use larger collections in order to be meaningful, then you will need to increase this value.

(See the docstring for s/every for more details.)

For example, suppose we want a spec for

  • A collection of ints in the range 1000 to 1020 (inclusive)
  • Min collection size 10
  • All elements distinct

The generator will almost certainly fail if we use predicate functions for this:

(gen/sample
 (s/gen
  (s/and (s/coll-of (s/int-in 1000 1021))
         #(<= 10 (count %))
         #(apply distinct? %))))
;;=> Error!

But the generation will work just fine when making use of the equivalent arguments to s/coll-of:

(gen/sample
 (s/gen
  (s/coll-of (s/int-in 1000 1021)
             :min-count 10
             :distinct true)))
;;=> ([1016 1010 1003 1000 1013...

Creating new generators with s/with-gen & gen/fmap

As we’ve seen, we can get very far simply by composing together predicates and various spec functions. Sooner or later though we’ll come across data that requires us to write our own generators.

Thankfully though, we rarely (if ever) need to write a generator from scratch. The two main tools we’ll use for creating generators are:

  • s/with-gen for declaring a spec with an accompanying generator
  • gen/fmap for creating a generator based on another generator, which simply transforms each value generated

For example, suppose we want to create a spec & generator for strings which are valid UUIDs. If we simply use s/and here then, as we’ve seen many times now, we’ll have a valid spec but a broken generator:

(gen/sample (s/gen (s/and string?
                          parse-uuid)))
;;=> Error: "Couldn't satisfy such-that predicate after 100 tries"

Recall that because of s/and, internally the generator will generate values based upon string? and then filter values for which parse-uuid returns a truthy value (i.e. valid UUID strings). This doesn’t work because the chance of any random string being a valid UUID is extremely small. Nor are there any alternate spec predicate functions or variants we can take advantage of.

In this situation we can reach for s/with-gen and gen/fmap, like so:

(s/def ::uuid-str
  (s/with-gen
    (s/and string? parse-uuid)
    #(gen/fmap str (s/gen uuid?))))

(gen/generate (s/gen ::uuid-str))
;;=> "989e05e3-59ae-4e70-939f-ceda36f70cfd"

(s/valid? ::uuid-str "989e05e3-59ae-4e70-939f-ceda36f70cfd")
;;=> true

Here we declare a spec with its name in the usual way using s/def, but we wrap our spec definition with s/with-gen. This function takes two arguments: the predicate which makes up our spec, and a no-argument function which returns a generator when called. (Note the # used to place our generator definition within a parameterless function literal.) Our specified generator will now be returned whenever s/gen is called on our spec.

gen/fmap simply returns a new generator which uses the generator it’s given to generate values, then applies the passed function f to each generated value. So here our new generator will generate java.util.UUIDs and then convert each one to a string using str.

Using gen/fmap for better performance

We don’t necessarily need to wait until our generator fails to generate values before reaching for gen/fmap (or indeed, the other techniques we’ve looked at so far); we can benefit from creating our own generators simply to get better performance for our tests.

While the standard wisdom of avoiding premature optimization still generally applies for generative testing, I’d argue that it’s worth prioritizing performance concerns a little higher than you would normally because of our need to run so many iterations. As you add more and more generative tests like that you may find your test suite slowing down, in which case your tests, specs and/or generators may need a bit of TLC.

For example, suppose we have created a simple spec for integers that are a multiple of 10, using s/and:

(s/def ::multiples-of-10
  (s/coll-of (s/and integer?
                    #(= 0 (rem % 10)))
             :gen-max 200))

This works well enough, but performance could be better:

(time
 (do
   (doall
    (gen/sample (s/gen ::multiples-of-10)
                200))
   nil))
;;=> ~500ms

Recalling how s/and affects generation, notice that for multiples-of-10’s generator the #(= 0 (rem % 10)) predicate acts a filter for the integers generated by integer?; a lot of results are being thrown away, which has a performance cost.

We can avoid wasting these generation iterations by ensuring that every raw generated value satisfies the spec: We can generate an integer, then multiply that integer by 10 to ensure that it is indeed a multiple of 10, using fmap:

(time
 (do
   (doall
    (gen/sample
     (gen/fmap
      (fn [coll]
        (mapv #(* 10 %) coll))
      (s/gen (s/coll-of
              (s/int-in
               (/ Long/MIN_VALUE 10)
               (/ Long/MAX_VALUE 10))
              :gen-max 200)))
              200))
   nil))
;;=> ~100ms

This version is noticeably faster since we’re no longer wasting cycles throwing away generated values.

Here we explicitly create our “inner” generator in the second argument to fmap; this simply generates raw integers, but with a limited range so that we don’t get integer overflows when the mapping function multiplies by 10.

Since we’ve made a more performant generator, then we may as well use s/with-gen to redefine our spec to use it as its generator:

(s/def ::multiples-of-10
  (s/with-gen
    (s/coll-of (s/and integer?
                      #(= 0 (rem % 10))))
    (fn []
      (gen/fmap
       (fn [coll]
         (mapv #(* 10 %) coll))
       (s/gen
        (s/coll-of
         (s/int-in
          (/ Long/MIN_VALUE 10)
          (/ Long/MAX_VALUE 10))
         :gen-max 200))))))

Spec generator quirks & gotchas

There are some quirks about certain spec generators that are good to know about; in some cases you may wish to reach for creating a generator from the clojure.spec.gen.alpha namespace.

(s/gen string?) only generates alphanumeric characters

(s/gen string?) will never give you characters outside of the letters and numbers. This is a real shame because it’s when you introduce other characters that you can really help expose bugs in your string-processing functions.

In contrast, consider using gen/string-ascii, which generates from the ASCII range of characters, or gen/string, which can generate even unprintable characters.

The default maximum size for s/coll-of generators is 20.

As we noted earlier, by default generators created from an s/coll-of spec have a gen-max option which defaults to 20. Be sure to increase this if handling larger collection sizes is key for your test.

s/coll-of VS gen/vector sizing behavior

These two generator types use the size parameter differently:

  • s/coll-of uses size to scale the contained elements, but the collection size is completely random
  • gen/vector uses size to determine the size of the collection

For example:

(gen/sample (s/gen (s/coll-of integer?)) 3)
;; => ([-1 -1 -1 -1 -1 -1 0 0 -1 0 -1 0 -1 0 0 0 0 -1 0 -1]
;;     [-1 0 -1 -1 -1 -1 -1 -1 0 0]
;;     [0 0 0 -1 0 1 -1 0 -1 -1 0 -1])

(gen/sample (gen/vector (gen/large-integer)) 3)
;; => ([] [] [1 1])

This means that using gen/vector may be preferable if your function under test is more sensitive to the size of the collection than the contained elements.

Using ad-hoc generated data in example-based tests

After getting so comfortable with making generators you may be wondering about using them to help you generate data for your example-based tests. You can do this, however you should be very careful if you do so! Remember that the key features provided by test.check are crucial for creating tests that are reliable and useful. I often see example-based tests with some or all of the inputs based on direct calls to gen/generate, presumably to avoid the drudgery of hand-crafting example inputs for large input maps. Sadly though such tests are often flaky.

If you simply want to avoid hand-crafting data for example-based tests, use one of the following techniques to use ad-hoc data generation safely:

  • save your generated test data: you can call the generator in a REPL session, and then save the output into your test file
  • generate using a seed: use the clojure.test.check.generators namespace directly (as opposed to the lazily-loaded clojure.spec.gen.alpha version) using the 3-parameter version of generate which allows you to specify a seed. This way each test run will be completely consistent. Useful for extremely large input collections which you don’t want to have to save as a literal in your test file.

Conclusion

test.check is a generative testing library with great integration with spec. Assuming your code is making use of spec already, then you’re already halfway there to creating some powerful tests that can help you to expose edge-cases that you might not have easily found otherwise.

The main challenges when creating generative tests are thinking of good general properties to check and creating reliable & efficient generators. Getting comfortable with these skills will reward you with the ability to get far more reliable unit tests than with example-based tests alone.