Writing tests in Praat

In the past year or so I’ve learned a lot of new things about how to write good and maintainable code. One of those, which has had a huge influence on the way I write, is automatic testing¹.

In a nutshell, a test script is simply a script, written in whatever language or environment you are working in, that executed the code you’ve written with some known input, and makes sure that the output is as expected. So there is nothing special about a test script: it is just another script.

Normally, tests are done using test frameworks, which make writing the tests easier, and make sure that the output conforms to some sort of standard (so you have a way of automatically checking whether the tests failed or not).

There are many frameworks out there, but I will be looking at one in particular: the Test Anything Protocol. I’ll be using this one for a couple of main reasons

1. It’s dead simple

TAP works with two components: the tap producer, which produces TAP output, and the TAP consumer, which takes that output and parses it into some sort of aggregated test result. If you have Perl on your computer (and chances are that you do), then you will surely have a tool called prove, which is a TAP consumer. So, since consumers are ubiquitous, we will focus on producing TAP output. It looks like this:

1..2
ok 1 - cromulate frumbles
not ok 2 - reticulate splines

Each line is the result of a test, and it starts with either ok or not ok depending on whether the test was successful or not. After that you have the test number, and then an optional message, so you know what you were testing without having to look at the code itself.

The line at the top is called the plan: it tells the consumer how many tests you are going to run, so that if the test suite (= the set of tests you are running) is interrupted in the middle, you have a way of knowing that something went wrong. If you don’t know how many tests you’ll be running, the plan can optionally come at the end (by which time you should know how many tests executed). But there must always be a plan.

2. It’s widespread

Although TAP started out as a tool for testing Perl in particular, since it is so versatile (after all, it is called the Test Anything Protocol) it was soon ported to other languages. So regardless of what language you are using, chances are you’ll have a TAP library you can access.

3. It can be used from Praat.

And of course, this includes Praat, since that’s what this post is about!

The library for producing TAP output from Praat is in the tap plugin, distributed through CPrAN, and it has no dependencies, which means installation should be really easy.

So let’s see how the output above can be produced:

1
2
3
4
5
6
@plan: 2

@ok: 1 + 2 ==  3, "cromulate frumbles"
@ok: 2 - 2 == -2, "reticulate splines"

@done_testing()

Those procedures are defined in the more.proc file in the tap plugin, which you’ll have to include in your scripts before you can use them. But once you do, you’ll have access to all those convenience procedures you see, and many more. What others? Let’s take an overview:

The TAP interface

The plan

Since all tests need a plan, the first thing you have to do is specify how many tests you will run, either by calling @plan: number_of_tests or @no_plan(). If you forget, and use the other functions before calling a plan, you’ll get an error. At the very end, you should also call the @done_testing() procedure, to make sure we know when we’ve finished.

And now you have a plan!

OK or not OK

The most basic procedure to use while testing is @ok: it takes a condition and a message, and will print either ok or not ok depending on whether the condition is true or false. It will also take fcare of all the other things you saw above: the numbering, the hyphens, etc.

The more.proc file defines some higher level test procedures as well, which give you some additional benefits. In particular it gives you semantic tests, that not only know when they are wrong, but know why they are wrong (to a certain extent, of course).

What is is, and what isn’t

If you are checking whether two values are the same, you can check for that equality itself (which will have either a true or a false value). But you could also write @is: number, 10, "number is ten" if you are checking a number, or @is$: name$, "bob", "hello bob" if you are checking a string (you can also use @isnt and @isnt$, if you are checking for differences).

The benefit in these procedures is that you are specifying the actual and the expected value, so the test can give you a more detailed message as to why it failed (if it failed). In the above example, we would have no idea why the “reticulate splines” test failed. But if we had written the test like this:

1
2
3
4
5
6
@plan: 2

@is: 1 + 2,  3, "cromulate frumbles"
@is: 2 - 2, -2, "reticulate splines"

@done_testing()

Then the output would have been

1..2
ok 1 - cromulate frumbles
not ok 2 - reticulate splines
#   Failed test 'reticulate splines'
#          got: 0
#     expected: -2

So now we have a diagnostic message, and it tells us all we need to know: we expected the value to be -2, but it is actually 0. No wonder it failed!

There are many more tests like this, including ones that let you make checks using regular expressions, or make tests that are specifically tailored for Praat objects, such as testing for object equality, or whether objects exist, or even whether some objects still remain in the Object list after the tests have completed. Check them out!

A real-world example

Let’s look at an example. Imagine you write the following procedure in Praat², to extract only unique strings from a Strings object:

1
2
3
4
5
6
7
# Make a Strings object with unique entries
procedure uniq ()
  .cat  = To Categories
  .ucat = To unique Categories
  .id   = To Strings
  removeObject: .cat, .ucat
endproc

Looking good! Called with a Strings object selected, it will generate a new Strings object in which all entries are unique. But now I want to write tests for this.

What would a test for this look like?

In a nutshell, a test is simply a script that makes sure that executing the code we’ve written will do what we want. This means that we need to call our code with a particular input, and make sure we get the expected result.

Let’s look first at what a simple test could look like without using the “tap” plugin. The most simple version might look something like this:

1
2
3
4
test = Create Strings as tokens: "a b b c"
@uniq()
total_strings = Get number of strings
assert total_strings == 3

In this script, we prepare the input (line #1), we execute our code (line #2), and then we make sure that we’ve removed one of the strings, since we know that there should be only 3 unique strings (the assert in line #4 tests a condition, and throws an error if the condition is false).

The show must go on

There is one problem, though: since we are using assert, this means that the test will stop whenever there is an error. As we start adding more tests, we’ll soon realise that it’s better if we can get a result of all the tests that are failing, instead of having to fix them one by one.

For starters, there are insights we can get from knowing where all the errors are. But fixing problems is also a lot slower, because you can only work in the first error you encountered, and only work on one error at the same time. Even if two errors are on the same line, you won’t know about the second until you’ve run the entire test suite again.

Let’s TAP!

Let’s re-write our test script using the procedures in the more.proc file in the “tap” plugin, which we talked about above. Our test script should now look like this:

1
2
3
4
5
6
7
8
9
@no_plan()

test = Create Strings as tokens: "a b b c"
@uniq()
total_strings = Get number of strings

@is: total_strings, 3, "removed one duplicate string"

@done_testing()

Running this, would generate this output:

ok 1 - removed one duplicate string
1..3

It might look like a lot of work for two lines of output, but in all likelihood your tests wil be more complex than this little example. In fact… even this little example is not complete!

Iteration makes perfect

What happens if we run our procedure on an empty Strings object? What is the unique set of strings in an empty set? Since empty Strings are valid Strings objects, I’d argue that the set of unique entries should be an empty Strings object. So we check that by adding a few more tests:

1
2
3
4
5
6
list = Create Strings as tokens: ""
@uniq()
total_strings = Get number of strings
@is: total_strings, 0, "unique set is empty"

@done_testing()

But when we run this, we find an error! And an ugly error too, because it makes our entire test suite crash. It turns out, the To Categories command requires a Strings object with at least one string. So we need to change our initial procedure:

1
2
3
4
5
6
7
8
9
10
11
12
procedure uniq ()
  .n = Get number of strings
  if .n
    .cat  = To Categories
    .ucat = To unique Categories
    To Strings
    removeObject: .cat, .ucat
  else
    Copy: selected$("Strings")
  endif
  .id = selected("Strings")
endproc

So now we start by checking whether there are any strings, and if there are none, we simply make a copy of the empty Strings object. Our code is now significantly more robust!

Another thing that is missing from our test is that we are not checking for empty or whitespace strings (lowercase) in our Strings objects (uppercase). This is because the input we are preparing splits the set of characters using spaces. But we can add spaces manually, to make sure it behaves as we expect.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
test = Create Strings as tokens: "a b b c"
Insert string: 1, " "
Insert string: 1, ""
Insert string: 1, tab$
Insert string: 1, newline$
original = Get number of strings
@uniq()
unique = Get number of strings

@is: original, unique + 1, "removed one duplicate string"

list = Create Strings as tokens: ""
@uniq()
total_strings = Get number of strings
@is: total_strings, 0, "unique set is empty"

@done_testing()

Closing

Our test script is already bearing fruit, and it’s but a handful of lines long. It’s made our procedure more robust, and more importantly perhaps, it’s made its behaviour clearer to us and anyone who chooses to look at the code. This makes it easier to make changes in the future, because we can be sure that the changes we’ve introduced still make the procedure behave as expected.

There are still ways to improve the test we wrote. For example, it’s not the tidiest test, since it doesn’t clean up after itself (try adding @ok_selection() just before the end to see what I mean!). But you can play with that on your own.

Version control must take the lead, though. I mean… how can you possibly beat version control?!? ↩
Although testing is something that is applicable to all programming languages, and indeed to many other aspects of software development, I’ll keep the examples Praat-centered in this post. I encourage you to think of ways to use these principles in everything you do! ↩