|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Typelevel Native |
| 4 | +category: technical |
| 5 | + |
| 6 | +meta: |
| 7 | + nav: blog |
| 8 | + author: armanbilge |
| 9 | +--- |
| 10 | + |
| 11 | +We recently published several major Typelevel projects for the [Scala Native] platform, most notably [Cats Effect], [FS2], and [http4s]. This blog post explores what this new platform means for the Typelevel ecosystem as well as how it works under-the-hood. |
| 12 | + |
| 13 | +### What is Scala Native? |
| 14 | + |
| 15 | +[Scala Native] is an optimizing ahead-of-time compiler for the Scala language. Put simply: it enables you to **compile Scala code directly to native executables**. |
| 16 | + |
| 17 | +It is an ambitious project following in the steps of [Scala.js]. Instead of targeting JavaScript, the Scala Native compiler targets the [LLVM] IR and uses its toolchain to generate native executables for a range of architectures, including x86, ARM, and in the near future [web assembly]. |
| 18 | + |
| 19 | +### Why is this exciting? |
| 20 | + |
| 21 | +**For Scala in general**, funnily enough I think [GraalVM Native Image] does a great job summarizing the advantages of native executables, namely: |
| 22 | +* instant startup that immediately achieves peak performance, without requiring warmup or the heavy footprint of the JVM |
| 23 | +* packagable into small, self-contained binaries for easy deployment and distribution |
| 24 | + |
| 25 | +It is worth mentioning that in benchmarks Scala Native handily beats GraalVM Native Image on startup time, runtime footprint, and binary size. |
| 26 | + |
| 27 | +Moreover, breaking free from the JVM is an opportunity to design a runtime specifically optimized for the Scala language itself. This is the true potential of the Scala Native project. |
| 28 | + |
| 29 | +**For Typelevel in particular**, Scala Native opens new doors for leveling up our ecosystem. Our flagship libraries are largely designed for deploying high performance I/O-bounded microservices and for the first time ever **we now have direct access to kernel I/O APIs**. |
| 30 | + |
| 31 | +I am also enthusiastic to use Cats Effect with (non-Scala) native libraries that expose a C API. [`Resource`] and more generally [`MonadCancel`] are powerful tools for safely navigating manual memory management with all the goodness of error-handling and cancelation. |
| 32 | + |
| 33 | +### How can I try it? |
| 34 | + |
| 35 | +Christopher Davenport has put up a [scala-native-ember-example](https://github.com/ChristopherDavenport/scala-native-ember-example) and reported some [benchmark results](#ember-native-benchmark)! |
| 36 | + |
| 37 | +### How does it work? |
| 38 | + |
| 39 | +The burden of cross-building the Typelevel ecosystem for Scala Native fell almost entirely to [Cats Effect] and [FS2]. |
| 40 | + |
| 41 | +#### Event loop runtime |
| 42 | + |
| 43 | +**To cross-build Cats Effect for Native we had to get creative** because Scala Native currently does not support multithreading (although it will in the next major release). This is a similar situation to the JavaScript runtime, which is also fundamentally single-threaded. But an important difference is that JS runtimes are implemented with an [event loop] and offer callback-based APIs for scheduling timers and performing non-blocking I/O. An *event loop* is a type of runtime that enables compute tasks, timers, and non-blocking I/O to be interleaved on a single thread (although not every event loop does all these things). |
| 44 | + |
| 45 | +Meanwhile, Scala Native core does not implement an event loop nor offer such APIs. There is the [scala-native-loop] project, which wraps the [libuv] event loop runtime, but we did not want to bake such an opinionated dependency into Cats Effect core. |
| 46 | + |
| 47 | +Fortunately Daniel Spiewak had the fantastic insight that the “dummy runtime” which I created to initially cross-build Cats Effect for Native could be reformulated into a legitimate event loop implementation by extending it with the capability to “poll” for I/O events: a `PollingExecutorScheduler`. |
| 48 | + |
| 49 | +The [`PollingExecutorScheduler`] implements both [`ExecutionContext`] and [`Scheduler`] and maintains two queues: |
| 50 | +- a queue of tasks (read: fibers) to execute |
| 51 | +- a priority queue of timers (read: `IO.sleep(...)`), sorted by expiration |
| 52 | + |
| 53 | +It also defines an abstract method: |
| 54 | +```scala |
| 55 | +def poll(timeout: Duration): Boolean |
| 56 | +``` |
| 57 | + |
| 58 | +The idea of this method is very similar to `Thread.sleep()` except that besides sleeping it may also “poll” for I/O events. It turns out that APIs like this are ubiquitous in C libraries that perform I/O. |
| 59 | + |
| 60 | +To demonstrate the API contract, consider invoking `poll(3.seconds)`: |
| 61 | + |
| 62 | +*I have nothing to do for the next 3 seconds. So wake me up then, or earlier if there is an incoming I/O event that I should handle. But wake me up no later!* |
| 63 | + |
| 64 | +*Oh, and don’t forget to tell me whether there are still outstanding I/O events (`true`) or not (`false`) so I know if I need to call you again. Thanks!* |
| 65 | + |
| 66 | +With tasks, timers, and the capability to poll for I/O, we can express the event loop algorithm. A single iteration of the loop looks like this: |
| 67 | + |
| 68 | +1. Check the current time and execute any expired timers. |
| 69 | + |
| 70 | +2. Execute up to 64 tasks, or until there are none left. We limit to 64 to ensure we are fair to timers and I/O. |
| 71 | + |
| 72 | +3. Poll for I/O events. There are three cases to consider: |
| 73 | + - **There is at least one task to do.** Call `poll(0.nanos)`, so it will process any available I/O events and then immediately return control. |
| 74 | + - **There is at least one outstanding timer**. Call `poll(durationToNextTimer)`, so it will sleep until the next I/O event arrives or the timeout expires, whichever comes first. |
| 75 | + - **There are no tasks to do and no outstanding timers.** Call `poll(Duration.Infinite)`, so it will sleep until the next I/O event arrives. |
| 76 | + |
| 77 | +This algorithm is not a Cats Effect original: the [libuv event loop] works in essentially the same way. It is however a first step toward the much grander Cats Effect [I/O Integrated Runtime Concept]. The big idea is that every `WorkerThread` in the `WorkStealingThreadPool` that underpins the Cats Effect JVM runtime can run an event loop exactly like the one described above, for exceptionally high-performance I/O. |
| 78 | + |
| 79 | +#### Non-blocking I/O |
| 80 | + |
| 81 | +**So, how do we implement `poll`?** The bad news is that the answer is OS-specific, which is a large reason why projects such as libuv exist. Furthermore, the entire purpose of polling is to support non-blocking I/O, which falls outside of the scope of Cats Effect. This brings us to FS2, and specifically the [`fs2-io`] module where we want to implement non-blocking TCP [`Socket`]s. |
| 82 | + |
| 83 | +One such polling API is [epoll], available only on Linux: |
| 84 | + |
| 85 | +```c |
| 86 | +#include <sys/epoll.h> |
| 87 | + |
| 88 | +int epoll_create1(int flags); |
| 89 | + |
| 90 | +int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); |
| 91 | + |
| 92 | +int epoll_wait(int epfd, struct epoll_event *events, |
| 93 | + int maxevents, int timeout); |
| 94 | +``` |
| 95 | +
|
| 96 | +After creating an epoll instance (identified by a file descriptor) we can register sockets (also identified by file descriptors) with `epoll_ctl`. Typically we will register to be notified of the “read-ready” (`EPOLLIN`) and “write-ready” (`EPOLLOUT`) events on that socket. Finally, the actual polling is implemented with `epoll_wait`, which sleeps until the next I/O event is ready or the `timeout` expires. Thus we can use it to implement a `PollingExecutorScheduler`. |
| 97 | +
|
| 98 | +As previously mentioned, these sorts of polling APIs are ubiquitous and not just for working directly with sockets. For example, [libcurl](https://curl.se/libcurl/) (the C library behind the well-known CLI) exposes a function for polling for I/O on all ongoing HTTP requests. |
| 99 | +
|
| 100 | +```c |
| 101 | +#include <curl/curl.h> |
| 102 | + |
| 103 | +CURLMcode curl_multi_poll(CURLM *multi_handle, |
| 104 | + struct curl_waitfd extra_fds[], |
| 105 | + unsigned int extra_nfds, |
| 106 | + int timeout_ms, |
| 107 | + int *numfds); |
| 108 | +``` |
| 109 | + |
| 110 | +Indeed, this function underpins the `CurlExecutorScheduler` in [http4s-curl]. |
| 111 | + |
| 112 | +On macOS and BSDs the [kqueue] API plays an analogous role to epoll. We will not talk about Windows today :) |
| 113 | + |
| 114 | +**Long story short, I did not want the FS2 codebase to absorb all of this cross-OS complexity.** So in collaboration with Lee Tibbert we repurposed my cheeky [epollcat] experiment into an actual library implementing JDK NIO APIs (specifically, [`AsynchronousSocketChannel`] and friends). Since these are the same APIs used by the JVM implementation of `fs2-io`, it actually enables the `Socket` code to be completely shared with Native. |
| 115 | + |
| 116 | +[epollcat] implements an `EpollExecutorScheduler` for Linux and a `KqueueExecutorScheduler` for macOS. They additionally provide an API for monitoring a socket file descriptor for read-ready and write-ready events. |
| 117 | + |
| 118 | +```scala |
| 119 | +def monitor(fd: Int, reads: Boolean, writes: Boolean)( |
| 120 | + cb: EventNotificationCallback |
| 121 | +): Runnable // returns a `Runnable` to un-monitor the file descriptor |
| 122 | + |
| 123 | +trait EventNotificationCallback { |
| 124 | + def notifyEvents(readReady: Boolean, writeReady: Boolean): Unit |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +These are then used to implement the callback-based `read` and `write` methods of the JDK `AsynchronousSocketChannel`. |
| 129 | + |
| 130 | +It is worth pointing out that the JVM actually implements `AsynchronousSocketChannel` with an event loop as well. The difference is that on the JVM, this event loop is used only for I/O and runs on a separate thread from the compute pool used for fibers and the scheduler thread used for timers. Meanwhile, epollcat is an example of an [I/O integrated runtime][I/O Integrated Runtime Concept] where fibers, timers, and I/O are all interleaved on a single thread. |
| 131 | + |
| 132 | +#### TLS |
| 133 | + |
| 134 | +**The last critical piece of the cross-build puzzle was a [TLS] implementation** for [`TLSSocket`] and related APIs in FS2. Although the prospect of this was daunting, in the end it was actually fairly straightforward to directly integrate with [s2n-tls], which exposes a well-designed and well-documented C API. This is effectively the only non-Scala dependency required to use the Typelevel stack on Native. |
| 135 | + |
| 136 | +Finally, special thanks to Ondra Pelech and Lorenzo Gabriele for cross-building [scala-java-time] and [scala-java-locales] for Native and David Strawn for developing [idna4s]. These projects fill important gaps in the Scala Native re-implementation of the JDK and were essential to seamless cross-building. |
| 137 | + |
| 138 | +And ... that is pretty much it. **From here, any library or application that is built using Cats Effect and FS2 cross-builds for Scala Native effectively for free.** Three spectacular examples of this are: |
| 139 | + |
| 140 | +* [http4s] Ember, a server+client duo with HTTP/2 support |
| 141 | +* [Skunk], a Postgres client |
| 142 | +* [rediculous], a Redis client |
| 143 | + |
| 144 | +These libraries in turn unlock projects such as [feral], [Grackle], and [smithy4s]. |
| 145 | + |
| 146 | +### What’s next and how can I get involved? |
| 147 | + |
| 148 | +Please try the Typelevel Native stack! And even better deploy it, and do so loudly! |
| 149 | + |
| 150 | +Besides that, here is a brain-dump of project ideas and existing projects that would love contributors. I am happy to help folks get started on any of these, or ideas of your own! |
| 151 | + |
| 152 | +* Creating example applications, templates, and tutorials: |
| 153 | + - If you are short on inspiration, try cross-building existing examples such as [fs2-chat], [kitteh-redis], [Jobby]. |
| 154 | + - Spread the word: [you-forgot-a-percentage-sign-or-a-colon]. |
| 155 | + |
| 156 | +* Cross-building existing libraries and developing new, Typelevel-stack ones: |
| 157 | + - Go [feral] and implement a pure Scala [custom AWS Lambda runtime] that cross-builds for Native. |
| 158 | + - A pure Scala [gRPC] implementation built on http4s would be fantastic, even for the JVM. Christopher Davenport has published a [proof-of-concept][grpc-playground]. |
| 159 | + - [fs2-data] has pure Scala support for a plethora of data formats. The [http4s-fs2-data] integration needs your help to get off the ground! |
| 160 | + - Lack of cross-platform cryptography is one of the remaining sore points in cross-building. I started the [bobcats] project to fill the gap but I am afraid it needs love from a more dedicated maintainer. |
| 161 | + |
| 162 | +* Integrations with native libraries: |
| 163 | + - I kick-started [http4s-curl] and would love to see someone take the reigns! |
| 164 | + - An [NGINX Unit] server backend for http4s promises exceptional performance. [snunit] pioneered this approach. |
| 165 | + - Using [quiche] for HTTP/3 looks yummy! |
| 166 | + - An idiomatic wrapper for [SQLite]. See also [davenverse/sqlite-sjs#1] which proposes cross-platform API backed by Doobie on the JVM. |
| 167 | + |
| 168 | +* Developing I/O-integrated runtimes: |
| 169 | + - [epollcat] supports Linux and macOS and has plenty of opportunity for optimization and development. |
| 170 | + - A [libuv]-based runtime would have solid cross-OS support, including Windows. Prior art in [scala-native-loop]. |
| 171 | + - Personally I am excited to work on an [io_uring] runtime. |
| 172 | + |
| 173 | +* Tooling. Anton Sviridov has spear-headed two major projects in this area: |
| 174 | + - [sbt-vcpkg] is working hard to solve the native dependency problem. |
| 175 | + - [sn-bindgen] generates Scala Native bindings to native libraries directly from `*.h` header files. I found it immensely useful while working on http4s-curl, epollcat, and the s2n-tls integration in FS2. |
| 176 | + - Also: we are _badly_ in need of a pure Scala port of the [Java Microbenchmark Harness]. Not the whole thing obviously, but just enough to run the existing Cats Effect benchmarks for example. |
| 177 | + |
| 178 | +* Scala Native itself. Lots to do there! |
| 179 | + |
| 180 | +### Ember native benchmark |
| 181 | + |
| 182 | +```console |
| 183 | +$ hey -z 30s http://localhost:8080 |
| 184 | + |
| 185 | +Summary: |
| 186 | + Total: 30.0160 secs |
| 187 | + Slowest: 0.3971 secs |
| 188 | + Fastest: 0.0012 secs |
| 189 | + Average: 0.0131 secs |
| 190 | + Requests/sec: 3815.4647 |
| 191 | + |
| 192 | + Total data: 1145250 bytes |
| 193 | + Size/request: 10 bytes |
| 194 | + |
| 195 | +Response time histogram: |
| 196 | + 0.001 [1] | |
| 197 | + 0.041 [114486] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ |
| 198 | + 0.080 [7] | |
| 199 | + 0.120 [5] | |
| 200 | + 0.160 [5] | |
| 201 | + 0.199 [3] | |
| 202 | + 0.239 [5] | |
| 203 | + 0.278 [3] | |
| 204 | + 0.318 [4] | |
| 205 | + 0.357 [3] | |
| 206 | + 0.397 [3] | |
| 207 | + |
| 208 | + |
| 209 | +Latency distribution: |
| 210 | + 10% in 0.0119 secs |
| 211 | + 25% in 0.0121 secs |
| 212 | + 50% in 0.0122 secs |
| 213 | + 75% in 0.0125 secs |
| 214 | + 90% in 0.0133 secs |
| 215 | + 95% in 0.0224 secs |
| 216 | + 99% in 0.0234 secs |
| 217 | + |
| 218 | +Details (average, fastest, slowest): |
| 219 | + DNS+dialup: 0.0000 secs, 0.0012 secs, 0.3971 secs |
| 220 | + DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0011 secs |
| 221 | + req write: 0.0000 secs, 0.0000 secs, 0.0013 secs |
| 222 | + resp wait: 0.0131 secs, 0.0011 secs, 0.3941 secs |
| 223 | + resp read: 0.0000 secs, 0.0000 secs, 0.0010 secs |
| 224 | + |
| 225 | +Status code distribution: |
| 226 | + [200] 114525 responses |
| 227 | +``` |
| 228 | + |
| 229 | +[`AsynchronousSocketChannel`]: https://docs.oracle.com/javase/8/docs/api/java/nio/channels/AsynchronousSocketChannel.html |
| 230 | +[bobcats]: https://github.com/typelevel/bobcats |
| 231 | +[Cats Effect]: https://typelevel.org/cats-effect/ |
| 232 | +[custom AWS Lambda runtime]: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-custom.html |
| 233 | +[davenverse/sqlite-sjs#1]: https://github.com/davenverse/sqlite-sjs/pull/1 |
| 234 | +[`ExecutionContext`]: https://www.scala-lang.org/api/2.13.8/scala/concurrent/ExecutionContext.html |
| 235 | +[event loop]: https://javascript.info/event-loop |
| 236 | +[epoll]: https://man7.org/linux/man-pages/man7/epoll.7.html |
| 237 | +[epollcat]: https://github.com/armanbilge/epollcat |
| 238 | +[feral]: https://github.com/typelevel/feral |
| 239 | +[FS2]: https://fs2.io/ |
| 240 | +[fs2-chat]: https://github.com/typelevel/fs2-chat/ |
| 241 | +[fs2-data]: https://github.com/satabin/fs2-data/ |
| 242 | +[`fs2-io`]: https://fs2.io/#/io |
| 243 | +[GraalVM Native Image]: https://www.graalvm.org/22.2/reference-manual/native-image/ |
| 244 | +[Grackle]: https://github.com/gemini-hlsw/gsp-graphql |
| 245 | +[gRPC]: https://grpc.io/ |
| 246 | +[grpc-playground]: https://github.com/ChristopherDavenport/grpc-playground |
| 247 | +[http4s]: https://http4s.org/ |
| 248 | +[http4s-curl]: https://github.com/http4s/http4s-curl/ |
| 249 | +[http4s-fs2-data]: https://github.com/http4s/http4s-fs2-data |
| 250 | +[idna4s]: https://github.com/typelevel/idna4s |
| 251 | +[I/O Integrated Runtime Concept]: https://github.com/typelevel/cats-effect/discussions/3070 |
| 252 | +[io_uring]: https://en.wikipedia.org/wiki/Io_uring |
| 253 | +[Jobby]: https://github.com/keynmol/jobby/ |
| 254 | +[kitteh-redis]: https://github.com/djspiewak/kitteh-redis |
| 255 | +[kqueue]: https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2 |
| 256 | +[Java Microbenchmark Harness]: https://github.com/openjdk/jmh |
| 257 | +[libuv]: https://github.com/libuv/libuv/ |
| 258 | +[libuv event loop]: https://docs.libuv.org/en/v1.x/design.html#the-i-o-loop |
| 259 | +[libcurl]: https://curl.se/libcurl/ |
| 260 | +[LLVM]: https://llvm.org/ |
| 261 | +[`MonadCancel`]: https://typelevel.org/cats-effect/docs/typeclasses/monadcancel |
| 262 | +[NGINX Unit]: https://unit.nginx.org/ |
| 263 | +[`PollingExecutorScheduler`]: https://github.com/typelevel/cats-effect/blob/7ca03db50342773a79a01ecf137d953408ac6b1d/core/native/src/main/scala/cats/effect/unsafe/PollingExecutorScheduler.scala |
| 264 | +[quiche]: https://github.com/cloudflare/quiche |
| 265 | +[rediculous]: https://github.com/davenverse/rediculous |
| 266 | +[`Resource`]: https://typelevel.org/cats-effect/docs/std/resource |
| 267 | +[sbt-vcpkg]: https://github.com/indoorvivants/sbt-vcpkg/ |
| 268 | +[ScalablyTyped]: https://scalablytyped.org/ |
| 269 | +[Scala Native]: https://scala-native.org/ |
| 270 | +[Scala.js]: https://www.scala-js.org/ |
| 271 | +[scala-java-locales]: https://github.com/cquiroz/scala-java-locales |
| 272 | +[scala-java-time]: https://github.com/cquiroz/scala-java-time |
| 273 | +[scala-native-loop]: https://github.com/scala-native/scala-native-loop/ |
| 274 | +[`Scheduler`]: https://github.com/typelevel/cats-effect/blob/236a0db0e95be829de34d7a8e3c06914738b7b06/core/shared/src/main/scala/cats/effect/unsafe/Scheduler.scala |
| 275 | +[Skunk]: https://github.com/tpolecat/skunk |
| 276 | +[smithy4s]: https://disneystreaming.github.io/smithy4s/ |
| 277 | +[`Socket`]: https://www.javadoc.io/doc/co.fs2/fs2-docs_2.13/latest/fs2/io/net/Socket.html |
| 278 | +[SQLite]: https://www.sqlite.org/index.html |
| 279 | +[snunit]: https://github.com/lolgab/snunit |
| 280 | +[sn-bindgen]: https://github.com/indoorvivants/sn-bindgen |
| 281 | +[s2n-tls]: https://github.com/aws/s2n-tls |
| 282 | +[TLS]: https://en.wikipedia.org/wiki/Transport_Layer_Security\ |
| 283 | +[`TLSSocket`]: https://www.javadoc.io/doc/co.fs2/fs2-docs_2.13/latest/fs2/io/net/tls/TLSSocket.html |
| 284 | +[web assembly]: https://twitter.com/ShadajL/status/1548020571597811719 |
| 285 | +[you-forgot-a-percentage-sign-or-a-colon]: https://youforgotapercentagesignoracolon.com/ |
0 commit comments