Fiftyfive bytes of WebAssembly
To be honest, I only have a vague idea of how it works and currently no time to think about it. However, I do know that there’s a POPCNT instruction in my CPU (apparently introduced with SSE4 around 20072).
It turns out that WebAssembly also has a POPCNT instruction, though. Wouldn’t it be cool to use that instead of three lines of bittwiddling?
Off I go to write the following WebAssembly program in the WebAssembly s-expression representation. This defines a module with one exported function called popcnt that takes a 32 bit integer and returns a 32 bit integer. The body of the function applies the 32 bit integer popcnt instruction to local 0 — the function’s first argument.
(module (func (export "popcnt") (param i32) (result i32) (i32.popcnt (get_local 0))))
Now, the typical workflow for serious WebAssembly use is way overkill. You use a compiler to compile your code (C, C++, Rust) to WebAssembly bytecode. You probably need some special cross compilation stuff too. Then you put the bytecode file onto some webserver somewhere, so your browser can see it and load it over the network. Then you need to feed the bytestream to the WebAssembly compiler in the browser.
I don’t want to write a C/C++/Rust program for a single instruction and install their WebAssembly toolchains. There is a small tool to turn s-expression
.wat files into a
.wasm bytecode files called
wat2wasm. I don’t want to install that either. Luckily there’s a demo website where you can also download the compiled
var bytes = Uint8Array.from([0x00,0x61,0x73,0x6d,0x01,0x00,0x00,0x00,0x01,0x06,0x01,0x60,0x01,0x7f,0x01,0x7f, 0x03,0x02,0x01,0x00,0x07,0x0a,0x01,0x06,0x70,0x6f,0x70,0x63,0x6e,0x74,0x00,0x00, 0x0a,0x07,0x01,0x05,0x00,0x20,0x00,0x69,0x0b,0x00,0x0c,0x04,0x6e,0x61,0x6d,0x65, 0x02,0x05,0x01,0x00,0x01,0x00,0x00]);
Fun fact: the sequence 70 6f 70 63 6e 74 on the second line spells the name of the exported function: popcnt. (The other bytes probably mean something, too.)
Great! We have some bytes without having to set up a webserver somewhere. Now let’s feed them to WebAssembly. We
instantiate something with our bytecode in
Promise of something which has an
instance which has the
exports of our WebAssembly module. In our case there’s only
I have no idea how to synchronize on a Promise to go from async land back into sync land, so imagine the rest of the code being inside the callback.
What can we do? Call popcnt! And it works!
popCount against the WebAssembly
var count = 1000000; var res = 0; console.time("wasm"); for (var i = 0; i < count; i++) res += popcnt(i); console.timeEnd("wasm"); console.log(res); // Let's run the JS version for comparison. res = 0; console.time("js"); for (var j = 0; j < count; j++) res += popcount(j); console.timeEnd("js"); console.log(res);
Here are some numbers out of this highly scientific benchmark:
|106 calls in||WASM (ms)||JS (ms)|
work procrastination: check what all of this actually compiles to. Obviously these are not proper benchmarks. However, the numbers are all suspiciously close. It would be a shame to go through all of this trouble only to find out that they just use the same software implementation of POPCNT. On the other hand, I would be quite impressed if it turns out that v8 actually recognizes the software implementation and emits a POPCNT instruction. Also, check whether they work around the performance bug in Intel’s implementation.6
Find the code here.
“Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections” (Steindorfer and Vinju, OOPSLA 2015, https://doi.org/10.1145/2814270.2814312)↩