Writing tests in Praat
In the past year or so I’ve learned a lot of new things about how to write good and maintainable code. One of those, which has had a huge influence on the way I write, is automatic testing1.
In a nutshell, a test script is simply a script, written in whatever language or environment you are working in, that executed the code you’ve written with some known input, and makes sure that the output is as expected. So there is nothing special about a test script: it is just another script.
Normally, tests are done using test frameworks, which make writing the tests easier, and make sure that the output conforms to some sort of standard (so you have a way of automatically checking whether the tests failed or not).
There are many frameworks out there, but I will be looking at one in particular: the Test Anything Protocol. I’ll be using this one for a couple of main reasons
1. It’s dead simple
TAP works with two components: the tap producer, which produces TAP output,
and the TAP consumer, which takes that output and parses it into some sort of
aggregated test result. If you have Perl on your computer (and chances are that
you do), then you will surely have a tool called prove
, which is a TAP
consumer. So, since consumers are ubiquitous, we will focus on producing TAP
output. It looks like this:
Each line is the result of a test, and it starts with either ok
or not ok
depending on whether the test was successful or not. After that you have the
test number, and then an optional message, so you know what you were testing
without having to look at the code itself.
The line at the top is called the plan: it tells the consumer how many tests you are going to run, so that if the test suite (= the set of tests you are running) is interrupted in the middle, you have a way of knowing that something went wrong. If you don’t know how many tests you’ll be running, the plan can optionally come at the end (by which time you should know how many tests executed). But there must always be a plan.
2. It’s widespread
Although TAP started out as a tool for testing Perl in particular, since it is so versatile (after all, it is called the Test Anything Protocol) it was soon ported to other languages. So regardless of what language you are using, chances are you’ll have a TAP library you can access.
3. It can be used from Praat.
And of course, this includes Praat, since that’s what this post is about!
The library for producing TAP output from Praat is in the tap
plugin,
distributed through CPrAN, and it has no dependencies, which means installation
should be really easy.
So let’s see how the output above can be produced:
1
2
3
4
5
6
@plan: 2
@ok: 1 + 2 == 3, "cromulate frumbles"
@ok: 2 - 2 == -2, "reticulate splines"
@done_testing()
Those procedures are defined in the more.proc
file in the tap
plugin, which
you’ll have to include in your scripts before you can use them. But once you do,
you’ll have access to all those convenience procedures you see, and many more.
What others? Let’s take an overview:
The TAP interface
The plan
Since all tests need a plan, the first thing you have to do is specify how many
tests you will run, either by calling @plan: number_of_tests
or @no_plan()
.
If you forget, and use the other functions before calling a plan, you’ll get an
error. At the very end, you should also call the @done_testing()
procedure, to
make sure we know when we’ve finished.
And now you have a plan!
OK or not OK
The most basic procedure to use while testing is @ok
: it takes a condition and
a message, and will print either ok
or not ok
depending on whether the
condition is true or false. It will also take fcare of all the other things you
saw above: the numbering, the hyphens, etc.
The more.proc
file defines some higher level test procedures as well, which
give you some additional benefits. In particular it gives you semantic tests,
that not only know when they are wrong, but know why they are wrong (to a
certain extent, of course).
What is is, and what isn’t
If you are checking whether two values are the same, you can check for that
equality itself (which will have either a true or a false value). But you could
also write @is: number, 10, "number is ten"
if you are checking a number, or
@is$: name$, "bob", "hello bob"
if you are checking a string (you can also
use @isnt
and @isnt$
, if you are checking for differences).
The benefit in these procedures is that you are specifying the actual and the expected value, so the test can give you a more detailed message as to why it failed (if it failed). In the above example, we would have no idea why the “reticulate splines” test failed. But if we had written the test like this:
1
2
3
4
5
6
@plan: 2
@is: 1 + 2, 3, "cromulate frumbles"
@is: 2 - 2, -2, "reticulate splines"
@done_testing()
Then the output would have been
So now we have a diagnostic message, and it tells us all we need to know: we expected the value to be -2, but it is actually 0. No wonder it failed!
There are many more tests like this, including ones that let you make checks using regular expressions, or make tests that are specifically tailored for Praat objects, such as testing for object equality, or whether objects exist, or even whether some objects still remain in the Object list after the tests have completed. Check them out!
A real-world example
Let’s look at an example. Imagine you write the following procedure in Praat2, to extract only unique strings from a Strings object:
1
2
3
4
5
6
7
# Make a Strings object with unique entries
procedure uniq ()
.cat = To Categories
.ucat = To unique Categories
.id = To Strings
removeObject: .cat, .ucat
endproc
Looking good! Called with a Strings object selected, it will generate a new Strings object in which all entries are unique. But now I want to write tests for this.
What would a test for this look like?
In a nutshell, a test is simply a script that makes sure that executing the code we’ve written will do what we want. This means that we need to call our code with a particular input, and make sure we get the expected result.
Let’s look first at what a simple test could look like without using the “tap” plugin. The most simple version might look something like this:
1
2
3
4
test = Create Strings as tokens: "a b b c"
@uniq()
total_strings = Get number of strings
assert total_strings == 3
In this script, we prepare the input (line #1), we execute our code (line #2),
and then we make sure that we’ve removed one of the strings, since we know that
there should be only 3 unique strings (the assert
in line #4 tests a
condition, and throws an error if the condition is false).
The show must go on
There is one problem, though: since we are using assert
, this means that the
test will stop whenever there is an error. As we start adding more tests, we’ll
soon realise that it’s better if we can get a result of all the tests that are
failing, instead of having to fix them one by one.
For starters, there are insights we can get from knowing where all the errors are. But fixing problems is also a lot slower, because you can only work in the first error you encountered, and only work on one error at the same time. Even if two errors are on the same line, you won’t know about the second until you’ve run the entire test suite again.
Let’s TAP!
Let’s re-write our test script using the procedures in the more.proc
file in
the “tap” plugin, which we talked about above. Our test script should now look
like this:
1
2
3
4
5
6
7
8
9
@no_plan()
test = Create Strings as tokens: "a b b c"
@uniq()
total_strings = Get number of strings
@is: total_strings, 3, "removed one duplicate string"
@done_testing()
Running this, would generate this output:
ok 1 - removed one duplicate string
1..3
It might look like a lot of work for two lines of output, but in all likelihood your tests wil be more complex than this little example. In fact… even this little example is not complete!
Iteration makes perfect
What happens if we run our procedure on an empty Strings
object? What is the
unique set of strings in an empty set? Since empty Strings
are valid Strings
objects, I’d argue that the set of unique entries should be an empty Strings
object. So we check that by adding a few more tests:
1
2
3
4
5
6
list = Create Strings as tokens: ""
@uniq()
total_strings = Get number of strings
@is: total_strings, 0, "unique set is empty"
@done_testing()
But when we run this, we find an error! And an ugly error too, because it makes
our entire test suite crash. It turns out, the To Categories
command requires
a Strings
object with at least one string. So we need to change our initial
procedure:
1
2
3
4
5
6
7
8
9
10
11
12
procedure uniq ()
.n = Get number of strings
if .n
.cat = To Categories
.ucat = To unique Categories
To Strings
removeObject: .cat, .ucat
else
Copy: selected$("Strings")
endif
.id = selected("Strings")
endproc
So now we start by checking whether there are any strings, and if there are
none, we simply make a copy of the empty Strings
object. Our code is now
significantly more robust!
Another thing that is missing from our test is that we are not checking for
empty or whitespace strings (lowercase) in our Strings
objects (uppercase).
This is because the input we are preparing splits the set of characters using
spaces. But we can add spaces manually, to make sure it behaves as we expect.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
test = Create Strings as tokens: "a b b c"
Insert string: 1, " "
Insert string: 1, ""
Insert string: 1, tab$
Insert string: 1, newline$
original = Get number of strings
@uniq()
unique = Get number of strings
@is: original, unique + 1, "removed one duplicate string"
list = Create Strings as tokens: ""
@uniq()
total_strings = Get number of strings
@is: total_strings, 0, "unique set is empty"
@done_testing()
Closing
Our test script is already bearing fruit, and it’s but a handful of lines long. It’s made our procedure more robust, and more importantly perhaps, it’s made its behaviour clearer to us and anyone who chooses to look at the code. This makes it easier to make changes in the future, because we can be sure that the changes we’ve introduced still make the procedure behave as expected.
There are still ways to improve the test we wrote. For example, it’s not the
tidiest test, since it doesn’t clean up after itself (try adding
@ok_selection()
just before the end to see what I mean!). But you can play
with that on your own.
-
Version control must take the lead, though. I mean… how can you possibly beat version control?!? ↩
-
Although testing is something that is applicable to all programming languages, and indeed to many other aspects of software development, I’ll keep the examples Praat-centered in this post. I encourage you to think of ways to use these principles in everything you do! ↩