A Practical Guide to test.check
Here’s a pragmatic guide to generative testing in Clojure using test.check, oriented around spec.
One of spec’s main selling points is that it can be used for validation,
instrumentation and generative testing, but in practice I don’t see very many
codebases using spec also taking advantage of its integration with test.check.
So if you’re already using spec - or at least familiar with it - then this guide
is for you. But even if not, hopefully this guide will still give you a much
better understanding of test.check
and generative testing in Clojure.
Table of Contents
- Section 1: Quick Start
- Section 2: Key Features
- Section 3: Tips for writing good generative tests
- Section 4: Creating generators
- Key functions recognised by spec
- Using
s/and
&s/or
- Ordering of arguments in
s/and
- The limits of
s/and
’s internal filtering - Using spec ns functions to create generator-friendly specs
- Creating new generators with
s/with-gen
&gen/fmap
- Using
gen/fmap
for better performance - Spec generator quirks & gotchas
- Using ad-hoc generated data in example-based tests
- Conclusion
Section 1: Quick Start
What is test.check
?
In short, test.check
is a Clojure library for performing generative tests.
Generative tests - also known as property-based tests - are tests that run our
code under test against random inputs, then check general properties that should
apply across the range of possible randomly generated data. In contrast, we
refer to traditional unit tests that operate on fixed inputs as example-based
tests.
Your first generative test
To whet your appetite, let’s start with a simple example.
First off, you’ll need the following dependencies:
;; Main test.check dependency
org.clojure/test.check
{:mvn/version "1.1.1"}
;; Optional; extra tools, and (IMO)
;; better clojure.test integration
com.gfredericks/test.chuck
{:mvn/version "0.2.13"}
;; Main test.check dependency
org.clojure/test.check {:mvn/version "1.1.1"}
;; Optional; extra tools, and (IMO) better clojure.test integration
com.gfredericks/test.chuck {:mvn/version "0.2.13"}
Then we can create the following failing test:
(require
'[clojure.spec.alpha :as s]
'[clojure.test
:refer [is testing]]
'[clojure.test.check.clojure-test
:refer [defspec]]
'[com.gfredericks.test.chuck.clojure-test
:as chuck])
(defn broken-sort [coll]
(if (some #{13} coll)
nil
(sort coll)))
(defspec broken-sort-gen-test
(chuck/for-all [input-coll
(s/gen (s/coll-of int?))]
(let [output-coll
(broken-sort input-coll)]
(testing
"Result is in ascending order"
(when (seq input-coll)
(is (apply <= output-coll))))
(testing
"The sorted collection contains the same elements"
(is (= (group-by identity
input-coll)
(group-by identity
output-coll)))))))
(broken-sort-gen-test)
; => Fails! (95% of the time...)
(require '[clojure.spec.alpha :as s]
'[clojure.test :refer [is testing]]
'[clojure.test.check.clojure-test :refer [defspec]]
'[com.gfredericks.test.chuck.clojure-test :as chuck])
(defn broken-sort [coll]
(if (some #{13} coll)
nil
(sort coll)))
(defspec broken-sort-gen-test
(chuck/for-all [input-coll (s/gen (s/coll-of int?))]
(let [output-coll (broken-sort input-coll)]
(testing "Result is in ascending order"
(when (seq input-coll)
(is (apply <= output-coll))))
(testing "The sorted collection contains the same elements"
(is (= (group-by identity input-coll)
(group-by identity output-coll)))))))
(broken-sort-gen-test)
; => Fails! (95% of the time...)
The key things here are:
- We define a test.check unit test using
defspec
. (NB the naming is a bit unfortunate - this has nothing to do with clojure.spec!) - We can create generators from specs using
s/gen
; we create a spec for a collection of integers on the fly with(s/coll-of int?)
, then grab a test data generator for it usings/gen
- The
for-all
binding vector binds the generated value for each test run; the assertions are run every time a new test value is generated, and by default 100 test values are generated per test run
In your REPL try modifying broken-sort
; you should find that the only
implementation that consistently passes the test is a call to sort
. Hopefully
you can see already how powerful generative testing is; we can really nail down
the correctness of the behavior we want with just a single test!
Formatting & linting
Out of the box, the macros we use here don’t play very nicely with cljfmt or clj-kondo. Thankfully though this is easily fixed with a couple of config files:
.cljfmt.edn
:
{:extra-indents {for-all [[:inner 0]]}}
.clj-kondo/config.edn
:
{:lint-as
{clojure.test.check.clojure-test/defspec
clojure.test/deftest
com.gfredericks.test.chuck.clojure-test/for-all
clojure.test.check.properties/for-all}}
{:lint-as {clojure.test.check.clojure-test/defspec clojure.test/deftest
com.gfredericks.test.chuck.clojure-test/for-all clojure.test.check.properties/for-all}}
Vanilla vs test.chuck
’s for-all
In the above example I’ve used test.chuck
’s version of for-all
. In short,
this is because the vanilla version of for-all
doesn’t play nicely with
using is
assertions in tests; instead, test.check
’s built-in for-all
macro uses the overall truthiness of the body expression.
So, the test above rewritten to use the vanilla for-all
would look like:
(require
'[clojure.test.check.properties
:as p])
(defspec broken-sort-gen-test-vanilla-for-all
(p/for-all [input-coll
(s/gen (s/coll-of int?))]
(let [output-coll
(broken-sort input-coll)]
(and
(testing
"Result is in ascending order"
(or (empty? input-coll)
(and (seq output-coll)
(apply <= output-coll))))
(testing
"The sorted collection contains the same elements"
(= (group-by identity
input-coll)
(group-by identity
output-coll)))))))
(require '[clojure.test.check.properties :as p])
(defspec broken-sort-gen-test-vanilla-for-all
(p/for-all [input-coll (s/gen (s/coll-of int?))]
(let [output-coll (broken-sort input-coll)]
(and
(testing "Result is in ascending order"
(or (empty? input-coll)
(and (seq output-coll)
(apply <= output-coll))))
(testing "The sorted collection contains the same elements"
(= (group-by identity input-coll)
(group-by identity output-coll)))))))
In particular, note that we need to group both our assertions within an and
form. This isn’t too bad on its own, but since we’re trying to integrate with
clojure.test
here (by using defspec
), it makes sense to me to prefer an
approach that lets us write tests more consistently.
Therefore from now on I’ll only use the test.chuck
version of for-all
. I’ll
always explicitly prefix it with the ns alias chuck
for the sake of clarity,
but in your own code you’ll probably want to refer
whichever version of
for-all
you choose.
Section 2: Key Features
Usually when unit testing we are rightfully wary of using randomly-generated data, since when used naively it can lead to flaky tests and hard-to-diagnose failures. Proper generative testing differs from naive usage of random data by ensuring deterministic behavior and ease of failure analysis.
Test.check provides this capability for reliable generative tests through the following key features:
- Intelligently calculates the simplest failing inputs - test.check calls this shrinking
- Runs multiple iterations per test run - this helps prevent flakiness
- Intelligently generates data - “simpler” data is generated first
The power of Shrinking
One of the most powerful features of test.check
is shrinking, where rather
than simply spitting out the first failure it finds, it does some further work
to automagically find a simpler failing test input for us.
To see this in action, let’s take a look at the failing test output for our
earlier broken-sort-gen-test
example:
{:shrunk
{:total-nodes-visited 11,
:depth 3,
:pass? false,
:result false,
:result-data nil,
:time-shrinking-ms 1,
:smallest [{input-coll [13]}]},
:failed-after-ms 5,
:num-tests 13,
:seed 1700759647930,
:fail [{input-coll [15 36 -29 93 2 13 -756 -649 360 2]}],
:result false,
:result-data nil,
:failing-size 12,
:pass? false}
Under the :fail
key we have [{input-coll [15 36 -29 93 2 13 -756 -649 360 2]}],
which shows us that this particular test run failed with an input
collection of [15 36 -29 93 2 13 -756 -649 360 2]
. However, test.check
automatically does some further digging for us to find that the simplest input
collection that fails is [13]
.
This is very powerful! This result might feel obvious given how contrived our example is, but in more realistic circumstances this capability is very useful indeed:
- We’ve generated a failing test case; even before we take any shrinking into account, right off the bat we’ve found a failure that we might not have found through traditional example-based testing.
- This tiny collection is much easier to debug than the initial failure
- We can infer further information from the initial & shrunk data:
- Given that the simplified collection doesn’t contain the numbers in the initial failure run, then we can infer that the other elements are most likely irrelevant to the test failure
- Since
[]
is simpler than[13]
, but test.check didn’t shrink all the way down to[]
, we know that an empty vector would pass.
Multiple iterations per test run
One of the key parts of generative testing is avoiding the potential flakiness we might introduce through naive usage of random data. One of the simpler ways that test.check achieves this is by running multiple iterations per test run.
The key things to know here are:
- For tests created using
defspec
, when not specified the number of iterations is the value ofclojure.test.check.clojure-test/*default-test-count*
, which defaults to100
. - Otherwise, you can manually specify the number of iterations you want per test, like so:
(defspec my-test 200
When creating generative tests you might discover some inconsistent failures; if so, one thing you can try to make the failures happen with more regularity is to try upping the number of iterations. However, there’s only so much juice you can squeeze out of this before you make your test runs impractically slowly. So if upping the iterations isn’t enough on its own, you probably need to understand test.check’s concept of sizing for its generators.
Size matters
A key feature of test.check’s data generators is that they each accept a size parameter that places bounds on the resulting data. Test.check uses this to generate smaller (or rather, simpler) data towards the start of a test run, then uses larger and larger size bounds as the test run goes on. This helps to make test runs more consistent and, in concert with shrinking, makes failure cases simpler.
In practice, you will rarely if ever use this size parameter directly; as mentioned above, it’s mostly used behind the scenes by test.check.
Sizing in action
We can experiment with size directly by using the generate
function from the
ns clojure.test.check.generators
, which optionally accepts a size
parameter.
To demonstrate this more easily though, we can create a small helper function, like so:
(require
'[clojure.test.check.generators
:as tgen])
(defn exercise-sizes
([generator sizes]
(exercise-sizes generator sizes 100000))
([generator sizes num-iterations]
(->>
sizes
(map
(fn [size]
[size
(->> (repeatedly
num-iterations
#(tgen/generate generator
size))
(into (sorted-set)))]))
(into (sorted-map)))))
(exercise-sizes (s/gen integer?) (range 5))
;;=>
{0 #{-1 0}
1 #{-1 0}
2 #{-2 -1 0 1}
3 #{-4 -3 -2 -1 0 1 2 3}
4 #{-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7}}
(require '[clojure.test.check.generators :as tgen])
(defn exercise-sizes
([generator sizes]
(exercise-sizes generator sizes 100000))
([generator sizes num-iterations]
(->> sizes
(map (fn [size]
[size
(->> (repeatedly num-iterations
#(tgen/generate generator size))
(into (sorted-set)))]))
(into (sorted-map)))))
(exercise-sizes (s/gen integer?) (range 5))
;;=>
{0 #{-1 0}
1 #{-1 0}
2 #{-2 -1 0 1}
3 #{-4 -3 -2 -1 0 1 2 3}
4 #{-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7}}
(Note: we’re using test.check
’s generators namespace directly here, aliased as
tgen
, since it contains some extra arity versions of functions that sadly
aren’t included in the usual clojure.spec.gen.alpha
namespace.
We generally prefer the latter because it lazy-loads the generator
functionality, which then lets us specify our generators alongside our specs in
our production code, but without needing to include the test.check
dependency
in a production build.)
Size is abstract
First of all, we can observe that the size parameter is abstract; i.e. although
larger size values generally result in a wider range of inputs, it’s not as
simple as max value = size
, as we can see in the ranges of values produced by
(s/gen integer?)
above.
Some generators ignore size
Some generators take no notice of sizing and so just generate values within the same range regardless.
A good example of this is (s/gen uuid?)
, which generates UUIDs completely
randomly in the usual way regardless of sizing:
(exercise-sizes (s/gen uuid?) (range 3) 2)
;;=>
{0
#{#uuid "dfca3a57-3100-4c5c-8c90-5dbaa93d859b"
#uuid "212df58f-eb0d-487b-8225-1c61ad56e6d8"}
1
#{#uuid "eeb9d663-fb88-45ee-a4ba-9b184720425a"
#uuid "185f2d24-88a7-43d4-9b9b-b715d3ebe0ae"}
2
#{#uuid "d837eafb-0803-4a99-847a-e4533a8db643"
#uuid "fc3bce75-6684-4438-a8b1-9ae589aa52e3"}
A less obvious example is (s/gen boolean?)
, which generates a true
or
false
value equally likely regardless of size:
(exercise-sizes (s/gen boolean?) (range 5))
;;=>
{0 #{false true}
1 #{false true}
2 #{false true}
3 #{false true}
4 #{false true}}
Increasing size of test runs
For test runs, the size used for each iteration cycles from 0 through to 199, then cycles back to 0 again.
i.e.
- iteration 0: size 0
- iteration 1: size 1
- iteration 2: size 2
- iteration 199: size 199
- iteration 200: size 0
- iteration 201: size 1
In particular, this means that it’s a good idea to specify your number of
iterations as 200 or more, since otherwise you don’t cover as big a range of
possible data; remember that defspec
defaults to 100 iterations if you don’t
specify it. (This is an odd choice by the defspec
implementation IMO,
especially since the underlying raw test.check test functions default to 200.)
We can see this in action by making use of the sample
helper function, which
behaves in the same way.
(require '[clojure.spec.gen.alpha :as gen])
(gen/sample (s/gen integer?))
;;=> (0 -1 0 -1 -1 1 -7 -18 -5 -112)
We can see the deeper in the sequence the values appear, the more likely it is to be a larger value.
To make this a bit more obvious, let’s specify a larger number of values to generate:
(gen/sample (s/gen integer?) 20)
;;=> (-1 -1 0 -4 0 -7 -4 15 -1 17 -479 440 -15 -1615 64 -2 213 924 -1343 11157)
Section 3: Tips for writing good generative tests
The bare minimum: exposing exceptions
The good news is that even if we can’t think of some good general properties to check, at the very least we can verify that our functions get as far as returning something without blowing up with an exception.
For example, suppose we are testing a function like this speed calculation function:
(defn speed [distance time]
(/ distance time))
The this is the sort of function where the real meat of the logic is arguably better demonstrated through traditional example tests. However, we can chuck some numbers through it to expose the divide-by-zero error:
(defspec speed-gen-test 200
(chuck/for-all
[[distance time]
(s/gen (s/tuple double? double?))]
(speed distance time)
(is true
"Hey, at least we didn't blow up!")))
(speed-gen-test)
;;=> java.lang.ArithmeticException: Divide by zero
...
:smallest [{distance 1.0, time 0.0}]
...
(defspec speed-gen-test 200
(chuck/for-all [[distance time] (s/gen (s/tuple double? double?))]
(speed distance time)
(is true "Hey, at least we didn't blow up!")))
(speed-gen-test)
;;=> java.lang.ArithmeticException: Divide by zero
...
:smallest [{distance 1.0, time 0.0}]
...
This shows that our function’s spec needs to be improved; we need to exclude zero:
(s/def ::non-zero-double
(s/and double?
(complement zero?)))
(defn speed [distance time]
(/ distance time))
(defspec speed-gen-test 200
(chuck/for-all
[[distance time]
(s/gen (s/tuple double?
::non-zero-double))]
(speed distance time)
(is true
"Hey, at least we didn't blow up!")))
(speed-gen-test)
;;=> OK
(s/def ::non-zero-double
(s/and double?
(complement zero?)))
(defn speed [distance time]
(/ distance time))
(defspec speed-gen-test 200
(chuck/for-all [[distance time] (s/gen (s/tuple double? ::non-zero-double))]
(speed distance time)
(is true "Hey, at least we didn't blow up!")))
(speed-gen-test)
;;=> OK
Improving input & output specs with instrumentation
We can improve our generative tests indirectly by making use of instrumentation - i.e. automatic validation of function inputs against specs. This helps us to discover functions whose input specs are too strict. Then as we widen the scope of our specs, our corresponding generative tests will cover more ground.
For example, suppose we were trying to spec out the sort
function, using
almost the same generative test as for our initial broken-sort
example.
(However, we’ll create a duplicate my-sort
function in order not to break the
real sort
, since doing so would break our REPL in surprising ways!)
(s/def ::sortable
(s/coll-of integer?))
(defn my-sort [coll]
(sort coll))
(s/fdef my-sort
:args (s/cat :coll ::sortable))
(my-sort [2 1])
;;=> (1 2)
(defspec broken-sort-gen-test
(chuck/for-all [input-coll
(s/gen ::sortable)]
(let [output-coll
(broken-sort input-coll)]
(testing "Result is in ascending order"
(when (seq input-coll)
(is (apply <= output-coll))))
(testing (str "The sorted collection "
"contains the same elements")
(is (= (group-by identity input-coll)
(group-by identity output-coll)))))))
(my-sort-gen-test) ;;=> [passes]
(my-sort [1.2 1.1])
;;=> Execption: Spec assertion failed!
(s/def ::sortable
(s/coll-of integer?))
(defn my-sort [coll]
(sort coll))
(s/fdef my-sort
:args (s/cat :coll ::sortable))
(my-sort [2 1])
;;=> (1 2)
(defspec broken-sort-gen-test
(chuck/for-all [input-coll (s/gen ::sortable)]
(let [output-coll (broken-sort input-coll)]
(testing "Result is in ascending order"
(when (seq input-coll)
(is (apply <= output-coll))))
(testing "The sorted collection contains the same elements"
(is (= (group-by identity input-coll)
(group-by identity output-coll)))))))
(my-sort-gen-test) ;;=> [passes]
(my-sort [1.2 1.1])
;;=> Execption: Spec assertion failed!
Here we’ve naively assumed we’re only sorting integers, but our instrumentation has helped us find that this assumption was incorrect. (In practice, the instrumentation would more likely be exposing issues during integration testing or ad-hoc manual tests rather than REPL interaction - but you get the idea.)
So, let’s expand our ::sortable
spec a little - why not expand it out to any
valid number? This gets past our instrumentation, but our generative test now
fails:
(s/def ::sortable
(s/coll-of number?))
(my-sort [2 1])
;;=> (1 2)
(my-sort [1.2 1.1])
;;=> (1.1 1.2)
(my-sort-gen-test)
;;=> {:smallest [##NaN]}
Our generative test has shown us that our new ::sortable
spec, meant to
include all numbers, also (unintuitively) includes ##NaN
(Not a Number). But
due to quirks of ##NaN
, it doesn’t play well with sorting. And sure enough,
the Clojure comparators guide
recommends removing all occurrences of ##NaN
before sorting a collection.
Therefore it makes sense for our ::sortable
spec to exclude ##NaN
. With this
done, our generative test passes again:
(s/def ::sortable
(s/coll-of (s/and number?
(complement NaN?))))
(my-sort-gen-test)
;;=> [passes]
If we were depending on example-based tests alone it could have been all too
easy to just add or update tests that try some plain doubles, while forgetting
about ##NaN
.
The key thing here is that instrumentation works well in concert with generative tests to help us get our specs just right; instrumentation helps expose places where are specs are too strict, while generative tests help us ensure that we don’t expand our specs out too widely in response.
Finding general properties to check
One of the main challenges of writing generative tests is finding good general properties to test in your tests, without needing to write a test sophisticated enough that it pretty much duplicates the logic of your code under test.
In most cases we can’t nail down the functionality as completely as we can with something like a sort function. But we can still get quite a lot of coverage by testing the following things:
- Properties about the input
- Properties about the output
- Properties about how the output relates to the input
We get (1) from instrumentation, so we’re going to focus here on (2) & (3).
For example, suppose we were writing a generative test for the
camel-snake-kebab library’s
->kebab-case
function. In case you’re not already aware, this converts
variously-cased strings to kebab case, like so:
(require '[camel-snake-kebab.core :as csk])
(csk/->kebab-case "fooBar") ;;=> "foo-bar"
(csk/->kebab-case "Foo_Bar") ;;=> "foo-bar"
We can’t write a test that completely describes the expected outputs
given a generalized input without duplicating the logic implemented by
->kebab-case
in the first place. But we can at least pin down some general
properties, even if they don’t completely describe the expected behavior.
Let’s start with the following:
- a) Result contains no uppercase characters or underscores
- b) Ordering of letters is preserved
Which we can implement as a generative test like so:
(defspec kebab-case-gen-test 200
(chuck/for-all [input
(gen/string)]
(let [output
(csk/->kebab-case input)]
(testing
"Result contains no uppercase characters or underscores"
(is (re-matches #"[^A-Z_]*" output)))
(testing
"Letter ordering is preserved"
(letfn [(lower-and-strip [s]
(-> (str/lower-case s)
(str/replace #"[-_]"
"")))]
(is (= (lower-and-strip input)
(lower-and-strip output))))))))
(defspec kebab-case-gen-test 200
(chuck/for-all [input (gen/string)]
(let [output (csk/->kebab-case input)]
(testing "Result contains no uppercase characters or underscores"
(is (re-matches #"[^A-Z_]*" output)))
(testing "Letter ordering is preserved"
(letfn [(lower-and-strip [s]
(-> (str/lower-case s)
(str/replace #"[-_]" "")))]
(is (= (lower-and-strip input)
(lower-and-strip output))))))))
However, this test fails, with a simplest failing input of " "
(space).
(You may notice that we’re using gen/string
here as our generator rather than
(s/gen string?)
; this is because the latter only generates alphanumeric
characters.)
Let’s try this input out in the REPL to see what the problem is:
(csk/->kebab-case " ") ;;=> ""
Aha! We’ve stumbled across the fact that ->kebab-case
also strips whitespace:
(csk/->kebab-case "foo bar") ;;=> "foo-bar"
Let’s update our test to reflect this:
(defspec kebab-case-gen-test 200
(chuck/for-all [input
(gen/string)]
(let [output
(csk/->kebab-case input)]
(testing
(str "Result contains no uppercase "
"characters, underscores or whitespace")
(is (re-matches #"[^A-Z_\s]*"
output)))
(testing "Letter ordering is preserved"
(letfn [(lower-and-strip [s]
(-> (str/lower-case s)
(str/replace #"[-_\s]"
"")))]
(is (= (lower-and-strip input)
(lower-and-strip output))))))))
(defspec kebab-case-gen-test
(chuck/for-all [input (gen/string)]
(let [output (csk/->kebab-case input)]
(testing (str "Result contains no uppercase characters, "
"underscores or whitespace")
(is (re-matches #"[^A-Z_\s]*" output)))
(testing "Letter ordering is preserved"
(letfn [(lower-and-strip [s]
(-> (str/lower-case s)
(str/replace #"[-_\s]" "")))]
(is (= (lower-and-strip input)
(lower-and-strip output))))))))
This passes - great!
What this shows is that generative testing helps us find test cases that we might not have thought of otherwise. We could have written plenty of example-based tests where we didn’t take into account whitespace, but using test.check has helped to expose this behavior.
Each iteration needs to be fast
One of the main restrictions to generative testing is that you only get the most out of them if each iteration is lightning-quick. Too slow, and you’re forced to choose between either bloating your test suite’s run time, or dropping the number of data generation iterations, weakening the test’s reliability.
Ideally function under test should be a fast, pure function; failing that, you’ll need to stub out any slow operations such as I/O through techniques such as parameterization or using with-redefs.
Don’t neglect example-based tests
As powerful as generative tests are, we shouldn’t suddenly turn our back on good old example-based tests. Let’s consider some of the advantages of them to see why:
A picture tells a thousand words
Automated tests don’t just help ensure correctness, they also act as documentation. And as clojuredocs.org shows, good documentation is greatly aided by a few concrete examples to help you really grok what a function does.
Better test output
Unfortunately, even when using test.chuck
’s version of for-all
, we don’t get
the same detailed output for assertion failures that we would get when running
an example-based test. test.chuck
allows us to use is
, but we still don’t
get told exactly which assertion failed.
Example-based tests give much better output for the particulars of a given test failure. Therefore when encountering a generative test failure it can be worth creating an example-based test for that test, even just temporarily, in order to better understand what’s going wrong.
Easier debugging
Similar to the previous point, example-based tests are much easier to debug using interactive debuggers, logging and other traditional debugging techniques. This is because they will usually be making a single call to your function under test, or at least not very many.
In contrast, generative tests are impractical for interactive debugging due to the large number of executions and unpredictable input, so this is another good reason to create dedicated example-based tests for particular failure cases you want to investigate.
Fast
“Everything is fast for fast for small n”. This is true up to a point; while it’s certainly possible to create slow, bloated example-based tests, it’s certainly easier to make them fast compared to generative tests. If nothing else you have a bit more leeway to perform some IO or other slow operations within your test, which gives you more flexibility.
Easier to write
If you find yourself agonizing too long over how to write the perfect generative test for your function that would specify its behavior completely & elegantly, then you may be better off just writing some example-based tests instead; perhaps you can come back to the generative version later.
Don’t let perfect be the enemy of good, and don’t feel like you’re copping out by writing example-based tests.
Section 4: Creating generators
One key challenge for creating generative tests is writing effective generators.
We’ve seen already we can get very far with creating generators just by
calling s/gen
on a spec. This works for simple generators, but there are some
key things to know to avoid getting stuck.
We’ll start by looking at how to make the most of the generators that you can create directly from specs, then look at a few more advanced techniques for creating more specialized generators.
Key functions recognized by spec
The first potential obstacle to be aware of is that only certain functions &
predicates known by s/gen
can be used for creating generators. We can create
a spec from any predicate, but any given predicate is opaque from the perspective
of spec unless spec “knows” about it.
For example, the following behave the same with respect to validation
(s/valid? boolean? true)
;;=> true
(s/valid? #(instance? Boolean %) true)
;;=> true
However, only the first one can be used to generate a spec using s/gen
:
(gen/sample (s/gen boolean?))
;;=> (true true false true true false true false true false)
(gen/sample (s/gen #(instance? Boolean %)))
;;=> ERROR! ("Spec assertion failed")
Spec knows how to generate values for boolean?
, but not for our second
predicate, even though they amount to the same thing.
Therefore if you happen to have some specs lying around that use idiosyncratic predicates like this then you may have to rework them a little to use more standard ones.
The generation-aware functions you can use are:
- Spec-defining functions in
clojure.spec.alpha
, such ascoll-of
,keys
,int-in
,inst-in
, etc. - Clojure core predicates listed on the Clojure Cheat Sheet under “Predicates with test.check generators”
- Hash sets (i.e. standard Clojure sets, not sorted-set instances; the generator just chooses a random element from the set)
Later on, we’ll look at using s/with-gen
to provide spec with knowledge
of how to generate values for a predicate.
Using s/and
& s/or
As well as allowing us to create compound predicates more succinctly, s/and
&
s/or
have the additional key aspect that they stop a predicate from becoming
opaque for s/gen
.
For example, compare
(gen/sample (s/gen (s/and integer?
#(not= 1 %))))
;;=> (-1 -1 -1 -2 0 -1 0 0 0 4)
with
(gen/sample (s/gen (fn [x]
(and (integer? x)
(not= 1 x)))))
;;=> ERROR!
Again, in terms of validation these behave the same way, but for generation
purposes only the first one works. In the latter case spec only sees an
anonymous function; it can’t “peek” inside to see the use of integer?
. In the
first case though the use of s/and
allows spec to recognize this usage; it
will generate values using integer?
, and internally filter those generated
using #(not= 1 %)
s/or
behaves similarly:
(gen/sample (s/gen (s/or :integer integer?
:boolean boolean?)))
;;=> (-1 -1 false -2 true -1 true 0 false 4)
with
(gen/sample (s/gen (fn [x]
(or (integer? x)
(boolean? x)))))
;;=> ERROR!
Ordering of arguments in s/and
It’s worth highlighting something we just touched upon for s/and
: the first
argument passed to s/and
is used to determine the “base” generator function;
the rest are used to filter the generated results.
For example:
(gen/sample (s/gen (s/and integer?
#(not= 1 %))))
;;=> (-1 -1 -1 -2 0 -1 0 0 0 4)
with
(gen/sample (s/gen (s/and #(not= 1 %)
integer?)))
;;=> ERROR!
In the latter case, spec will look at #(not= 1 %)
in order to generate values,
but not know what to do with it!
The limits of s/and
’s internal filtering
When performing the internal filtering described above, generators created using
s/gen
will only try a certain number of attempts to generate a value, but
after that they will give up. This means that if the secondary predicates
in an s/and
form are too restrictive, then the generator may rarely work (or
not at all). In such cases, the chance of the internally generated
values passing the internal filtering is too low.
For example:
(gen/sample (s/gen (s/and integer?
#(<= 1000 %)
#(<= % 1020))))
;;=> Error: "Couldn't satisfy such-that predicate after 100 tries"
Internally, the created generator will generate integers in the same way as
(s/gen integer?)
would, and then check if it’s between 1000 & 1020. The chance
of this is low enough that after 100 tries the generator fails to generate such
a value most of the time.
This means that if our secondary predicates within our s/and
are doing more
than excluding exceptional cases, then we need to take advantage of other
techniques to create valid generators.
Using spec ns functions to create generator-friendly specs
A good rule of thumb to avoid creating failing generators is to favor usage
of functions within the clojure.spec.alpha
namespace where possible.
For example, the earlier example would be better written using s/int-in
:
(gen/sample (s/gen (s/int-in 1000 1021)))
;;=> (1001 1001 1002 1001 1000 1001 1006 1002 1002 1002)
For spec functions like this, rather than internally generating a value only to potentially throw it away due to predicates, spec can instead choose values more intelligently and so can guarantee that a value is generated on each attempt.
Another key way of making better generators is by taking advantage of the
optional keyword arguments you can pass to s/coll-of
. It’s a function you’re
almost certainly already using when making specs, but you might not be aware of
the following keys you can use:
:kind
, for specifying the collection type:count
, for specifying a collection which must have an exact number of elements:min-count
&max-count
, for (you guessed it) specifying a minimum and/or maxiumum number of elements for the collectiondistinct
, for ensuring that all elements in the collection are distinctgen-max
, for specifying a particular maximum amount of elements to be generated.
gen-max
in particular is good to be aware of; by default, gen-max is 20,
meaning your collections by default will contain at most 20 elements. This
default is in place presumably to avoid a problem where deeply nested
collections can contain an unmanageable number of
elements.
However your test needs to use larger collections in order to be meaningful,
then you will need to increase this value.
(See the docstring for s/every
for more details.)
For example, suppose we want a spec for
- A collection of ints in the range 1000 to 1020 (inclusive)
- Min collection size 10
- All elements distinct
The generator will almost certainly fail if we use predicate functions for this:
(gen/sample
(s/gen
(s/and (s/coll-of (s/int-in 1000 1021))
#(<= 10 (count %))
#(apply distinct? %))))
;;=> Error!
But the generation will work just fine when making use of the equivalent
arguments to s/coll-of
:
(gen/sample
(s/gen
(s/coll-of (s/int-in 1000 1021)
:min-count 10
:distinct true)))
;;=> ([1016 1010 1003 1000 1013...
Creating new generators with s/with-gen
& gen/fmap
As we’ve seen, we can get very far simply by composing together predicates and various spec functions. Sooner or later though we’ll come across data that requires us to write our own generators.
Thankfully though, we rarely (if ever) need to write a generator from scratch. The two main tools we’ll use for creating generators are:
s/with-gen
for declaring a spec with an accompanying generatorgen/fmap
for creating a generator based on another generator, which simply transforms each value generated
For example, suppose we want to create a spec & generator for strings which are
valid UUIDs. If we simply use s/and
here then, as we’ve seen many times now,
we’ll have a valid spec but a broken generator:
(gen/sample (s/gen (s/and string?
parse-uuid)))
;;=> Error: "Couldn't satisfy such-that predicate after 100 tries"
Recall that because of s/and
, internally the generator will generate values
based upon string?
and then filter values for which parse-uuid
returns a
truthy value (i.e. valid UUID strings). This doesn’t work because the chance of
any random string being a valid UUID is extremely small. Nor are there any
alternate spec predicate functions or variants we can take advantage of.
In this situation we can reach for s/with-gen
and gen/fmap
, like so:
(s/def ::uuid-str
(s/with-gen
(s/and string? parse-uuid)
#(gen/fmap str (s/gen uuid?))))
(gen/generate (s/gen ::uuid-str))
;;=> "989e05e3-59ae-4e70-939f-ceda36f70cfd"
(s/valid? ::uuid-str "989e05e3-59ae-4e70-939f-ceda36f70cfd")
;;=> true
Here we declare a spec with its name in the usual way using s/def
, but we wrap
our spec definition with s/with-gen
. This function takes two arguments: the
predicate which makes up our spec, and a no-argument function which returns a
generator when called. (Note the #
used to place our generator definition
within a parameterless function literal.) Our specified generator will now be
returned whenever s/gen
is called on our spec.
gen/fmap
simply returns a new generator which uses the generator it’s given to
generate values, then applies the passed function f
to each generated
value. So here our new generator will generate java.util.UUID
s and then
convert each one to a string using str
.
Using gen/fmap
for better performance
We don’t necessarily need to wait until our generator fails to generate values
before reaching for gen/fmap
(or indeed, the other techniques we’ve looked at
so far); we can benefit from creating our own generators simply to get better
performance for our tests.
While the standard wisdom of avoiding premature optimization still generally applies for generative testing, I’d argue that it’s worth prioritizing performance concerns a little higher than you would normally because of our need to run so many iterations. As you add more and more generative tests like that you may find your test suite slowing down, in which case your tests, specs and/or generators may need a bit of TLC.
For example, suppose we have created a simple spec for integers that are a
multiple of 10, using s/and
:
(s/def ::multiples-of-10
(s/coll-of (s/and integer?
#(= 0 (rem % 10)))
:gen-max 200))
This works well enough, but performance could be better:
(time
(do
(doall
(gen/sample (s/gen ::multiples-of-10)
200))
nil))
;;=> ~500ms
Recalling how s/and
affects generation, notice that for multiples-of-10
’s
generator the #(= 0 (rem % 10))
predicate acts a filter for the integers
generated by integer?
; a lot of results are being thrown away, which has a
performance cost.
We can avoid wasting these generation iterations by ensuring that every raw
generated value satisfies the spec: We can generate an integer, then multiply
that integer by 10 to ensure that it is indeed a multiple of 10, using fmap
:
(time
(do
(doall
(gen/sample
(gen/fmap
(fn [coll]
(mapv #(* 10 %) coll))
(s/gen (s/coll-of
(s/int-in
(/ Long/MIN_VALUE 10)
(/ Long/MAX_VALUE 10))
:gen-max 200)))
200))
nil))
;;=> ~100ms
(time
(do
(doall
(gen/sample
(gen/fmap (fn [coll] (mapv #(* 10 %) coll))
(s/gen (s/coll-of (s/int-in (/ Long/MIN_VALUE 10)
(/ Long/MAX_VALUE 10))
:gen-max 200)))
200))
nil))
;;=> ~100ms
This version is noticeably faster since we’re no longer wasting cycles throwing away generated values.
Here we explicitly create our “inner” generator in the second argument to
fmap
; this simply generates raw integers, but with a limited range so that we
don’t get integer overflows when the mapping function multiplies by 10.
Since we’ve made a more performant generator, then we may as well use
s/with-gen
to redefine our spec to use it as its generator:
(s/def ::multiples-of-10
(s/with-gen
(s/coll-of (s/and integer?
#(= 0 (rem % 10))))
(fn []
(gen/fmap
(fn [coll]
(mapv #(* 10 %) coll))
(s/gen
(s/coll-of
(s/int-in
(/ Long/MIN_VALUE 10)
(/ Long/MAX_VALUE 10))
:gen-max 200))))))
(s/def ::multiples-of-10
(s/with-gen
(s/coll-of (s/and integer?
#(= 0 (rem % 10))))
(fn []
(gen/fmap (fn [coll] (mapv #(* 10 %) coll))
(s/gen (s/coll-of (s/int-in (/ Long/MIN_VALUE 10)
(/ Long/MAX_VALUE 10))
:gen-max 200))))))
Spec generator quirks & gotchas
There are some quirks about certain spec generators that are good to know about;
in some cases you may wish to reach for creating a generator from the
clojure.spec.gen.alpha
namespace.
(s/gen string?)
only generates alphanumeric characters
(s/gen string?)
will never give you characters outside of the letters and
numbers. This is a real shame because it’s when you introduce other characters
that you can really help expose bugs in your string-processing functions.
In contrast, consider using gen/string-ascii
, which generates from the ASCII
range of characters, or gen/string
, which can generate even unprintable
characters.
The default maximum size for s/coll-of
generators is 20.
As we noted
earlier, by
default generators created from an s/coll-of
spec have a gen-max
option which
defaults to 20. Be sure to increase this if handling larger collection sizes is
key for your test.
s/coll-of
VS gen/vector
sizing behavior
These two generator types use the size parameter differently:
s/coll-of
uses size to scale the contained elements, but the collection size is completely randomgen/vector
uses size to determine the size of the collection
For example:
(gen/sample (s/gen (s/coll-of integer?)) 3)
;; => ([-1 -1 -1 -1 -1 -1 0 0 -1 0 -1 0 -1 0 0 0 0 -1 0 -1]
;; [-1 0 -1 -1 -1 -1 -1 -1 0 0]
;; [0 0 0 -1 0 1 -1 0 -1 -1 0 -1])
(gen/sample (gen/vector (gen/large-integer)) 3)
;; => ([] [] [1 1])
This means that using gen/vector
may be preferable if your function under test
is more sensitive to the size of the collection than the contained elements.
Using ad-hoc generated data in example-based tests
After getting so comfortable with making generators you may be wondering about
using them to help you generate data for your example-based tests. You
can do this, however you should be very careful if you do so! Remember that
the key features provided by test.check
are crucial for creating tests that
are reliable and useful. I often see example-based tests with some or all of the
inputs based on direct calls to gen/generate
, presumably to avoid the drudgery
of hand-crafting example inputs for large input maps. Sadly though such tests
are often flaky.
If you simply want to avoid hand-crafting data for example-based tests, use one of the following techniques to use ad-hoc data generation safely:
- save your generated test data: you can call the generator in a REPL session, and then save the output into your test file
- generate using a seed: use the
clojure.test.check.generators
namespace directly (as opposed to the lazily-loadedclojure.spec.gen.alpha
version) using the 3-parameter version ofgenerate
which allows you to specify a seed. This way each test run will be completely consistent. Useful for extremely large input collections which you don’t want to have to save as a literal in your test file.
Conclusion
test.check
is a generative testing library with great integration with spec.
Assuming your code is making use of spec already, then you’re already halfway
there to creating some powerful tests that can help you to expose edge-cases
that you might not have easily found otherwise.
The main challenges when creating generative tests are thinking of good general properties to check and creating reliable & efficient generators. Getting comfortable with these skills will reward you with the ability to get far more reliable unit tests than with example-based tests alone.