The other day at work I came across an interesting piece of code during code-review. The code was testing a text email that was supposed to include some JSON dumps, and it wanted to make sure that the dumps contained the right data.

Let’s see an example. Let’s say the email was something like this:

Foo!

    { "foo": true }

Bar!

    { "bar": false }

Zod!

And the objective was trying to get the embedded JSON data:

[
    { "foo": true },
    { "bar": false }
]

The code I saw did this by splitting the body of the email and iterating over every character. It then detected whether the current character was an opening or closing brace, and by keeping track of how many it had seen so far, it decided whether it was inside a JSON string or not.

It took me a while to realise that this was what it was doing, and when I saw it I thought: “there must be a simpler way”.

And it turned out there was.

Incremental parsing

After a couple of online searches, I came across a notice on incremental parsing in JSON::XS. From the documentation:

In some cases, there is the need for incremental parsing of JSON texts. While this module always has to keep both JSON text and resulting Perl data structure in memory at one time, it does allow you to parse a JSON stream incrementally. It does so by accumulating text until it has a full JSON object, which it then can decode.

After playing around a little bit, I came to this, which does exactly what this bit of code was trying to do, and did it in a far more maintainable way:1

use JSON::MaybeXS qw( JSON );

my $json = JSON->new;

# Called in void context to do the iteration manually
$json->incr_parse($text);

my @objects;
while (1) {
    # Try to get a single JSON object
    my $object = eval { $json->incr_parse };

    # End if there is nothing left loaded in the parser
    last unless $json->incr_text;

    # If it failed, skip what is currently loaded in the parser 
    # and try again
    unless ($object) {
        $json->incr_skip;
        next;
    }

    # Save whatever we've managed to successfully parse
    push @objects, $object;
}

And like the use of JSON::MaybeXS suggests, this feature is supported in all of the major JSON backends on CPAN.

  1. The jury is still out on whether doing this was a good idea. But as long as we’re going to be doing it, I’d rather we do it properly.