Skip to content

Commit 303afdd

Browse files
Merge pull request #397 from armanbilge/blog/typelevel-native
Typelevel Native blog post
2 parents 460dd7f + 60e110d commit 303afdd

2 files changed

Lines changed: 289 additions & 0 deletions

File tree

_data/authors.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -405,3 +405,7 @@ rahsan:
405405
full_name: "Raas Ahsan"
406406
twitter: "RaasAhsan"
407407
github: "RaasAhsan"
408+
armanbilge:
409+
full_name: "Arman Bilge"
410+
twitter: "armanbilge"
411+
github: "armanbilge"
Lines changed: 285 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,285 @@
1+
---
2+
layout: post
3+
title: Typelevel Native
4+
category: technical
5+
6+
meta:
7+
nav: blog
8+
author: armanbilge
9+
---
10+
11+
We recently published several major Typelevel projects for the [Scala Native] platform, most notably [Cats Effect], [FS2], and [http4s]. This blog post explores what this new platform means for the Typelevel ecosystem as well as how it works under-the-hood.
12+
13+
### What is Scala Native?
14+
15+
[Scala Native] is an optimizing ahead-of-time compiler for the Scala language. Put simply: it enables you to **compile Scala code directly to native executables**.
16+
17+
It is an ambitious project following in the steps of [Scala.js]. Instead of targeting JavaScript, the Scala Native compiler targets the [LLVM] IR and uses its toolchain to generate native executables for a range of architectures, including x86, ARM, and in the near future [web assembly].
18+
19+
### Why is this exciting?
20+
21+
**For Scala in general**, funnily enough I think [GraalVM Native Image] does a great job summarizing the advantages of native executables, namely:
22+
* instant startup that immediately achieves peak performance, without requiring warmup or the heavy footprint of the JVM
23+
* packagable into small, self-contained binaries for easy deployment and distribution
24+
25+
It is worth mentioning that in benchmarks Scala Native handily beats GraalVM Native Image on startup time, runtime footprint, and binary size.
26+
27+
Moreover, breaking free from the JVM is an opportunity to design a runtime specifically optimized for the Scala language itself. This is the true potential of the Scala Native project.
28+
29+
**For Typelevel in particular**, Scala Native opens new doors for leveling up our ecosystem. Our flagship libraries are largely designed for deploying high performance I/O-bounded microservices and for the first time ever **we now have direct access to kernel I/O APIs**.
30+
31+
I am also enthusiastic to use Cats Effect with (non-Scala) native libraries that expose a C API. [`Resource`] and more generally [`MonadCancel`] are powerful tools for safely navigating manual memory management with all the goodness of error-handling and cancelation.
32+
33+
### How can I try it?
34+
35+
Christopher Davenport has put up a [scala-native-ember-example](https://github.com/ChristopherDavenport/scala-native-ember-example) and reported some [benchmark results](#ember-native-benchmark)!
36+
37+
### How does it work?
38+
39+
The burden of cross-building the Typelevel ecosystem for Scala Native fell almost entirely to [Cats Effect] and [FS2].
40+
41+
#### Event loop runtime
42+
43+
**To cross-build Cats Effect for Native we had to get creative** because Scala Native currently does not support multithreading (although it will in the next major release). This is a similar situation to the JavaScript runtime, which is also fundamentally single-threaded. But an important difference is that JS runtimes are implemented with an [event loop] and offer callback-based APIs for scheduling timers and performing non-blocking I/O. An *event loop* is a type of runtime that enables compute tasks, timers, and non-blocking I/O to be interleaved on a single thread (although not every event loop does all these things).
44+
45+
Meanwhile, Scala Native core does not implement an event loop nor offer such APIs. There is the [scala-native-loop] project, which wraps the [libuv] event loop runtime, but we did not want to bake such an opinionated dependency into Cats Effect core.
46+
47+
Fortunately Daniel Spiewak had the fantastic insight that the “dummy runtime” which I created to initially cross-build Cats Effect for Native could be reformulated into a legitimate event loop implementation by extending it with the capability to “poll” for I/O events: a `PollingExecutorScheduler`.
48+
49+
The [`PollingExecutorScheduler`] implements both [`ExecutionContext`] and [`Scheduler`] and maintains two queues:
50+
- a queue of tasks (read: fibers) to execute
51+
- a priority queue of timers (read: `IO.sleep(...)`), sorted by expiration
52+
53+
It also defines an abstract method:
54+
```scala
55+
def poll(timeout: Duration): Boolean
56+
```
57+
58+
The idea of this method is very similar to `Thread.sleep()` except that besides sleeping it may also “poll” for I/O events. It turns out that APIs like this are ubiquitous in C libraries that perform I/O.
59+
60+
To demonstrate the API contract, consider invoking `poll(3.seconds)`:
61+
62+
*I have nothing to do for the next 3 seconds. So wake me up then, or earlier if there is an incoming I/O event that I should handle. But wake me up no later!*
63+
64+
*Oh, and don’t forget to tell me whether there are still outstanding I/O events (`true`) or not (`false`) so I know if I need to call you again. Thanks!*
65+
66+
With tasks, timers, and the capability to poll for I/O, we can express the event loop algorithm. A single iteration of the loop looks like this:
67+
68+
1. Check the current time and execute any expired timers.
69+
70+
2. Execute up to 64 tasks, or until there are none left. We limit to 64 to ensure we are fair to timers and I/O.
71+
72+
3. Poll for I/O events. There are three cases to consider:
73+
- **There is at least one task to do.** Call `poll(0.nanos)`, so it will process any available I/O events and then immediately return control.
74+
- **There is at least one outstanding timer**. Call `poll(durationToNextTimer)`, so it will sleep until the next I/O event arrives or the timeout expires, whichever comes first.
75+
- **There are no tasks to do and no outstanding timers.** Call `poll(Duration.Infinite)`, so it will sleep until the next I/O event arrives.
76+
77+
This algorithm is not a Cats Effect original: the [libuv event loop] works in essentially the same way. It is however a first step toward the much grander Cats Effect [I/O Integrated Runtime Concept]. The big idea is that every `WorkerThread` in the `WorkStealingThreadPool` that underpins the Cats Effect JVM runtime can run an event loop exactly like the one described above, for exceptionally high-performance I/O.
78+
79+
#### Non-blocking I/O
80+
81+
**So, how do we implement `poll`?** The bad news is that the answer is OS-specific, which is a large reason why projects such as libuv exist. Furthermore, the entire purpose of polling is to support non-blocking I/O, which falls outside of the scope of Cats Effect. This brings us to FS2, and specifically the [`fs2-io`] module where we want to implement non-blocking TCP [`Socket`]s.
82+
83+
One such polling API is [epoll], available only on Linux:
84+
85+
```c
86+
#include <sys/epoll.h>
87+
88+
int epoll_create1(int flags);
89+
90+
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
91+
92+
int epoll_wait(int epfd, struct epoll_event *events,
93+
int maxevents, int timeout);
94+
```
95+
96+
After creating an epoll instance (identified by a file descriptor) we can register sockets (also identified by file descriptors) with `epoll_ctl`. Typically we will register to be notified of the “read-ready” (`EPOLLIN`) and “write-ready” (`EPOLLOUT`) events on that socket. Finally, the actual polling is implemented with `epoll_wait`, which sleeps until the next I/O event is ready or the `timeout` expires. Thus we can use it to implement a `PollingExecutorScheduler`.
97+
98+
As previously mentioned, these sorts of polling APIs are ubiquitous and not just for working directly with sockets. For example, [libcurl](https://curl.se/libcurl/) (the C library behind the well-known CLI) exposes a function for polling for I/O on all ongoing HTTP requests.
99+
100+
```c
101+
#include <curl/curl.h>
102+
103+
CURLMcode curl_multi_poll(CURLM *multi_handle,
104+
struct curl_waitfd extra_fds[],
105+
unsigned int extra_nfds,
106+
int timeout_ms,
107+
int *numfds);
108+
```
109+
110+
Indeed, this function underpins the `CurlExecutorScheduler` in [http4s-curl].
111+
112+
On macOS and BSDs the [kqueue] API plays an analogous role to epoll. We will not talk about Windows today :)
113+
114+
**Long story short, I did not want the FS2 codebase to absorb all of this cross-OS complexity.** So in collaboration with Lee Tibbert we repurposed my cheeky [epollcat] experiment into an actual library implementing JDK NIO APIs (specifically, [`AsynchronousSocketChannel`] and friends). Since these are the same APIs used by the JVM implementation of `fs2-io`, it actually enables the `Socket` code to be completely shared with Native.
115+
116+
[epollcat] implements an `EpollExecutorScheduler` for Linux and a `KqueueExecutorScheduler` for macOS. They additionally provide an API for monitoring a socket file descriptor for read-ready and write-ready events.
117+
118+
```scala
119+
def monitor(fd: Int, reads: Boolean, writes: Boolean)(
120+
cb: EventNotificationCallback
121+
): Runnable // returns a `Runnable` to un-monitor the file descriptor
122+
123+
trait EventNotificationCallback {
124+
def notifyEvents(readReady: Boolean, writeReady: Boolean): Unit
125+
}
126+
```
127+
128+
These are then used to implement the callback-based `read` and `write` methods of the JDK `AsynchronousSocketChannel`.
129+
130+
It is worth pointing out that the JVM actually implements `AsynchronousSocketChannel` with an event loop as well. The difference is that on the JVM, this event loop is used only for I/O and runs on a separate thread from the compute pool used for fibers and the scheduler thread used for timers. Meanwhile, epollcat is an example of an [I/O integrated runtime][I/O Integrated Runtime Concept] where fibers, timers, and I/O are all interleaved on a single thread.
131+
132+
#### TLS
133+
134+
**The last critical piece of the cross-build puzzle was a [TLS] implementation** for [`TLSSocket`] and related APIs in FS2. Although the prospect of this was daunting, in the end it was actually fairly straightforward to directly integrate with [s2n-tls], which exposes a well-designed and well-documented C API. This is effectively the only non-Scala dependency required to use the Typelevel stack on Native.
135+
136+
Finally, special thanks to Ondra Pelech and Lorenzo Gabriele for cross-building [scala-java-time] and [scala-java-locales] for Native and David Strawn for developing [idna4s]. These projects fill important gaps in the Scala Native re-implementation of the JDK and were essential to seamless cross-building.
137+
138+
And ... that is pretty much it. **From here, any library or application that is built using Cats Effect and FS2 cross-builds for Scala Native effectively for free.** Three spectacular examples of this are:
139+
140+
* [http4s] Ember, a server+client duo with HTTP/2 support
141+
* [Skunk], a Postgres client
142+
* [rediculous], a Redis client
143+
144+
These libraries in turn unlock projects such as [feral], [Grackle], and [smithy4s].
145+
146+
### What’s next and how can I get involved?
147+
148+
Please try the Typelevel Native stack! And even better deploy it, and do so loudly!
149+
150+
Besides that, here is a brain-dump of project ideas and existing projects that would love contributors. I am happy to help folks get started on any of these, or ideas of your own!
151+
152+
* Creating example applications, templates, and tutorials:
153+
- If you are short on inspiration, try cross-building existing examples such as [fs2-chat], [kitteh-redis], [Jobby].
154+
- Spread the word: [you-forgot-a-percentage-sign-or-a-colon].
155+
156+
* Cross-building existing libraries and developing new, Typelevel-stack ones:
157+
- Go [feral] and implement a pure Scala [custom AWS Lambda runtime] that cross-builds for Native.
158+
- A pure Scala [gRPC] implementation built on http4s would be fantastic, even for the JVM. Christopher Davenport has published a [proof-of-concept][grpc-playground].
159+
- [fs2-data] has pure Scala support for a plethora of data formats. The [http4s-fs2-data] integration needs your help to get off the ground!
160+
- Lack of cross-platform cryptography is one of the remaining sore points in cross-building. I started the [bobcats] project to fill the gap but I am afraid it needs love from a more dedicated maintainer.
161+
162+
* Integrations with native libraries:
163+
- I kick-started [http4s-curl] and would love to see someone take the reigns!
164+
- An [NGINX Unit] server backend for http4s promises exceptional performance. [snunit] pioneered this approach.
165+
- Using [quiche] for HTTP/3 looks yummy!
166+
- An idiomatic wrapper for [SQLite]. See also [davenverse/sqlite-sjs#1] which proposes cross-platform API backed by Doobie on the JVM.
167+
168+
* Developing I/O-integrated runtimes:
169+
- [epollcat] supports Linux and macOS and has plenty of opportunity for optimization and development.
170+
- A [libuv]-based runtime would have solid cross-OS support, including Windows. Prior art in [scala-native-loop].
171+
- Personally I am excited to work on an [io_uring] runtime.
172+
173+
* Tooling. Anton Sviridov has spear-headed two major projects in this area:
174+
- [sbt-vcpkg] is working hard to solve the native dependency problem.
175+
- [sn-bindgen] generates Scala Native bindings to native libraries directly from `*.h` header files. I found it immensely useful while working on http4s-curl, epollcat, and the s2n-tls integration in FS2.
176+
- Also: we are _badly_ in need of a pure Scala port of the [Java Microbenchmark Harness]. Not the whole thing obviously, but just enough to run the existing Cats Effect benchmarks for example.
177+
178+
* Scala Native itself. Lots to do there!
179+
180+
### Ember native benchmark
181+
182+
```console
183+
$ hey -z 30s http://localhost:8080
184+
185+
Summary:
186+
Total: 30.0160 secs
187+
Slowest: 0.3971 secs
188+
Fastest: 0.0012 secs
189+
Average: 0.0131 secs
190+
Requests/sec: 3815.4647
191+
192+
Total data: 1145250 bytes
193+
Size/request: 10 bytes
194+
195+
Response time histogram:
196+
0.001 [1] |
197+
0.041 [114486] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
198+
0.080 [7] |
199+
0.120 [5] |
200+
0.160 [5] |
201+
0.199 [3] |
202+
0.239 [5] |
203+
0.278 [3] |
204+
0.318 [4] |
205+
0.357 [3] |
206+
0.397 [3] |
207+
208+
209+
Latency distribution:
210+
10% in 0.0119 secs
211+
25% in 0.0121 secs
212+
50% in 0.0122 secs
213+
75% in 0.0125 secs
214+
90% in 0.0133 secs
215+
95% in 0.0224 secs
216+
99% in 0.0234 secs
217+
218+
Details (average, fastest, slowest):
219+
DNS+dialup: 0.0000 secs, 0.0012 secs, 0.3971 secs
220+
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0011 secs
221+
req write: 0.0000 secs, 0.0000 secs, 0.0013 secs
222+
resp wait: 0.0131 secs, 0.0011 secs, 0.3941 secs
223+
resp read: 0.0000 secs, 0.0000 secs, 0.0010 secs
224+
225+
Status code distribution:
226+
[200] 114525 responses
227+
```
228+
229+
[`AsynchronousSocketChannel`]: https://docs.oracle.com/javase/8/docs/api/java/nio/channels/AsynchronousSocketChannel.html
230+
[bobcats]: https://github.com/typelevel/bobcats
231+
[Cats Effect]: https://typelevel.org/cats-effect/
232+
[custom AWS Lambda runtime]: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-custom.html
233+
[davenverse/sqlite-sjs#1]: https://github.com/davenverse/sqlite-sjs/pull/1
234+
[`ExecutionContext`]: https://www.scala-lang.org/api/2.13.8/scala/concurrent/ExecutionContext.html
235+
[event loop]: https://javascript.info/event-loop
236+
[epoll]: https://man7.org/linux/man-pages/man7/epoll.7.html
237+
[epollcat]: https://github.com/armanbilge/epollcat
238+
[feral]: https://github.com/typelevel/feral
239+
[FS2]: https://fs2.io/
240+
[fs2-chat]: https://github.com/typelevel/fs2-chat/
241+
[fs2-data]: https://github.com/satabin/fs2-data/
242+
[`fs2-io`]: https://fs2.io/#/io
243+
[GraalVM Native Image]: https://www.graalvm.org/22.2/reference-manual/native-image/
244+
[Grackle]: https://github.com/gemini-hlsw/gsp-graphql
245+
[gRPC]: https://grpc.io/
246+
[grpc-playground]: https://github.com/ChristopherDavenport/grpc-playground
247+
[http4s]: https://http4s.org/
248+
[http4s-curl]: https://github.com/http4s/http4s-curl/
249+
[http4s-fs2-data]: https://github.com/http4s/http4s-fs2-data
250+
[idna4s]: https://github.com/typelevel/idna4s
251+
[I/O Integrated Runtime Concept]: https://github.com/typelevel/cats-effect/discussions/3070
252+
[io_uring]: https://en.wikipedia.org/wiki/Io_uring
253+
[Jobby]: https://github.com/keynmol/jobby/
254+
[kitteh-redis]: https://github.com/djspiewak/kitteh-redis
255+
[kqueue]: https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2
256+
[Java Microbenchmark Harness]: https://github.com/openjdk/jmh
257+
[libuv]: https://github.com/libuv/libuv/
258+
[libuv event loop]: https://docs.libuv.org/en/v1.x/design.html#the-i-o-loop
259+
[libcurl]: https://curl.se/libcurl/
260+
[LLVM]: https://llvm.org/
261+
[`MonadCancel`]: https://typelevel.org/cats-effect/docs/typeclasses/monadcancel
262+
[NGINX Unit]: https://unit.nginx.org/
263+
[`PollingExecutorScheduler`]: https://github.com/typelevel/cats-effect/blob/7ca03db50342773a79a01ecf137d953408ac6b1d/core/native/src/main/scala/cats/effect/unsafe/PollingExecutorScheduler.scala
264+
[quiche]: https://github.com/cloudflare/quiche
265+
[rediculous]: https://github.com/davenverse/rediculous
266+
[`Resource`]: https://typelevel.org/cats-effect/docs/std/resource
267+
[sbt-vcpkg]: https://github.com/indoorvivants/sbt-vcpkg/
268+
[ScalablyTyped]: https://scalablytyped.org/
269+
[Scala Native]: https://scala-native.org/
270+
[Scala.js]: https://www.scala-js.org/
271+
[scala-java-locales]: https://github.com/cquiroz/scala-java-locales
272+
[scala-java-time]: https://github.com/cquiroz/scala-java-time
273+
[scala-native-loop]: https://github.com/scala-native/scala-native-loop/
274+
[`Scheduler`]: https://github.com/typelevel/cats-effect/blob/236a0db0e95be829de34d7a8e3c06914738b7b06/core/shared/src/main/scala/cats/effect/unsafe/Scheduler.scala
275+
[Skunk]: https://github.com/tpolecat/skunk
276+
[smithy4s]: https://disneystreaming.github.io/smithy4s/
277+
[`Socket`]: https://www.javadoc.io/doc/co.fs2/fs2-docs_2.13/latest/fs2/io/net/Socket.html
278+
[SQLite]: https://www.sqlite.org/index.html
279+
[snunit]: https://github.com/lolgab/snunit
280+
[sn-bindgen]: https://github.com/indoorvivants/sn-bindgen
281+
[s2n-tls]: https://github.com/aws/s2n-tls
282+
[TLS]: https://en.wikipedia.org/wiki/Transport_Layer_Security\
283+
[`TLSSocket`]: https://www.javadoc.io/doc/co.fs2/fs2-docs_2.13/latest/fs2/io/net/tls/TLSSocket.html
284+
[web assembly]: https://twitter.com/ShadajL/status/1548020571597811719
285+
[you-forgot-a-percentage-sign-or-a-colon]: https://youforgotapercentagesignoracolon.com/

0 commit comments

Comments
 (0)