Improve b64url benchmarking

nickva · nickva · commit bbbbfc82de3a · 2025-12-16T12:13:08.000-05:00
Use erlperf as recommended in https://www.erlang.org/doc/system/benchmarking.html - Generate data in a more determinstic way and outside the main benchmarking lop - Benchmark encoding and decoding separately - Benchmark a wide range of sizes Issue: #5801
diff --git a/src/b64url/README.md b/src/b64url/README.md
@@ -14,24 +14,45 @@ decoding Base64 URL values:
 
 ## Performance
 
-This implementation is significantly faster than the Erlang version it replaced
-in CouchDB. The `benchmark.escript` file contains the original implementation
-(using regular expressions to replace unsafe characters in the output of the
-`base64` module) and can be used to compare the two for strings of various
-lengths. For example:
+This implementation is faster than the Erlang version in OTP 26-28,
+especially for larger binaries (1000+ bytes). To benchmark clone
+erlperf repo and run `./benchmark.sh` script. In the future, it's
+plausible Erlang OTP's base64 module may become faster than the NIF,
+due to improvements in the JIT capabilities but it's not there yet.
 
 ```
-ERL_LIBS=_build/default/lib/b64url/ ./test/benchmark.escript 4 10 100 30
-erl :       75491270 bytes /  30 seconds =     2516375.67 bps
-nif :      672299342 bytes /  30 seconds =    22409978.07 bps
-```
+./benchmark.sh
+
+[...]
+
+--- bytes: 100 -----
+Code                   ||        QPS       Time   Rel
+encode_otp_100          1    1613 Ki     620 ns  100%
+encode_nif_100          1    1391 Ki     719 ns   86%
+Code                   ||        QPS       Time   Rel
+decode_nif_100          1    1453 Ki     688 ns  100%
+decode_otp_100          1    1395 Ki     716 ns   96%
+
+[...]
 
-This test invocation spawns four workers that generate random strings between 10
-and 100 bytes in length and then perform an encode/decode on them in a tight
-loop for 30 seconds, and then reports the aggregate encoded data volume. Note
-that the generator overhead (`crypto:strong_rand_bytes/1`) is included in these
-results, so the relative difference in encoder throughput is rather larger than
-what's reported here.
+--- bytes: 1000 -----
+Code                    ||        QPS       Time   Rel
+encode_nif_1000          1     369 Ki    2711 ns  100%
+encode_otp_1000          1     204 Ki    4904 ns   55%
+Code                    ||        QPS       Time   Rel
+decode_nif_1000          1     455 Ki    2196 ns  100%
+decode_otp_1000          1     178 Ki    5612 ns   39%
+
+[...]
+
+--- bytes: 10000000 -----
+Code                        ||        QPS       Time   Rel
+encode_nif_10000000          1         45   22388 us  100%
+encode_otp_10000000          1         19   51724 us   43%
+Code                        ||        QPS       Time   Rel
+decode_nif_10000000          1         55   18078 us  100%
+decode_otp_10000000          1         17   60020 us   30%
+```
 
 ## Timeslice Consumption
 
diff --git a/src/b64url/benchmark.sh b/src/b64url/benchmark.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+# Expects erlperf to be installed
+#
+# $ git clone https://github.com/max-au/erlperf.git
+# $ cd erlperf
+# $ rebar3 as prod escriptize
+# $ cd ..
+
+for i in 50 100 150 200 500 1000 5000 10000 50000 1000000 10000000; do
+ echo ""
+ echo "--- bytes: ${i} -----"
+ ERL_LIBS="." erlperf/erlperf -w 2 \
+  'runner(Bin) -> b64url:encode(Bin).' --label "encode_nif_${i}" \
+  'runner(Bin) -> base64:encode(Bin, #{mode => urlsafe, padding => false}).' --label "encode_otp_${i}" \
+   --init_runner_all "rand:seed(default,{1,2,3}), rand:bytes(${i})."
+
+ ERL_LIBS="." erlperf/erlperf -w 2 \
+  'runner(Enc) -> b64url:decode(Enc).' --label "decode_nif_${i}" \
+  'runner(Enc) -> base64:decode(Enc, #{mode => urlsafe, padding => false}).' --label "decode_otp_${i}" \
+   --init_runner_all "rand:seed(default,{1,2,3}), b64url:encode(rand:bytes(round(${i} * (3/4))))."
+done
diff --git a/src/b64url/test/benchmark.escript b/src/b64url/test/benchmark.escript