Today I would like to announce JsonHilo.js – the project I hinted at in a previous post.
It is an ultra-fast lossless streaming JSON parser with two interfaces:
low-level JsonLow – the ultra-fast core that provides a unique feature of losslessness.
JsonHigh fills the streaming JSON parser gap in the Deno ecosystem. It can also work as a good modern alternative in the browser and in Node.js.
JsonLow was originally created as a prerequisite to implementing an accurate JSON-Jevko translator, as no JSON parser known to me could fulfill my needs.
In particular all of them are lossy – there is no way to recover the exact input, including whitespace and string escape sequences, from their output.
I needed something lossless, but also fast and minimal.
Thus JsonHilo was born.
As far as I can benchmark, JsonHilo is the fastest streaming JSON parser in JavaScript.
This is based on a comparison with Clarinet which is the fastest parser I could find.
Despite having spent quite some time on optimizing and benchmarking, I’m sure I haven’t optimized everything there is to optimize (nor did I need to), so battle testing might reveal further possibilities.
Proper benchmarking is very difficult, so, as always, caveats apply:
I am working to improve the benchmarks further – all help and contributions are welcome.
With the above in mind, the gist of the results is this: low-level JsonHilo seems to be around 2x faster than Clarinet.
For a dramatic example let’s take a 3.2 GB JSON obtained like this:
curl https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20210623/wikidata-20210623-lexemes.json.bz2 | bunzip2 > big.json
and run it thru a benchmark which traverses the entire JSON tree to count how many values it holds:
sh avg.sh values.sh big.json
This yields the following results on my modest machine running up to date Linux:
command | average time (s) | ratio |
---|---|---|
deno run jsonhilo/values.js < big.json | 38.998 | 1.000 |
node jsonhilo/values.node.js < big.json | 42.886 | 1.100 |
node clarinet/values.js < big.json | 97.832 | 2.509 |
Over 200 million values in under 39 seconds. Nearly identical performance on Deno and Node.js. 2.5x faster than Clarinet. Overall not too shabby.
Because JsonHilo generates events for all of the input code points without converting or stripping anything off (including whitespace), things are possible to implement with it that are not possible with other parsers.
For example, an accurate JSON highlighter is trivial:
It is only a matter of spitting the code points back out with ANSI escape codes (in the above case) or HTML tags or whathaveyou attached, according to the events.
Similarly we can translate JSON to another format while preserving as much of the original as we like. I use this to translate JSON to Data Jevko to compare size and performance. Preliminary results are in line with predictions – Jevko is smaller and significantly faster. More on that in a future post.
See JsonHilo on GitHub for more. Issues and contributions welcome.
JsonHilo.js is released under the MIT license, so it can be used without restrictions.
Use it in your projects, share it with anyone who might find it useful, and let me know if it causes or (preferably) solves you any problems. ;)
Above all, have fun!
Write to me at dariusz.jedrzejczak.work+jsonhilo
Share this post on Reddit or Hacker News.