Skip to content

Commit de783ce

Browse files
committed
_posts: Add day 2 of infra week
1 parent 2cbbd48 commit de783ce

1 file changed

Lines changed: 263 additions & 0 deletions

File tree

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
---
2+
title: Infrastructure Week - Day 2: What Do We Do With Our Infrastructure?
3+
layout: post
4+
---
5+
6+
## What Does Void Do With the Infrastructure?
7+
8+
Yesterday we looked at what kinds of infrastructure Void has, how its
9+
managed, and what makes each kind unique and differently suited.
10+
Today we'll look at what runs on the infrastructure, and what it does.
11+
We'll then finally look at how we make sure it keeps running in the
12+
event of an error or disruption.
13+
14+
## Two Kinds of Services
15+
16+
Void runs, broadly speaking, two different categories of services. In
17+
the first category, we have the tooling that supports maintainers and
18+
makes it easier or in some cases possible to work on Void. These are
19+
services that most users are unaware of, and in general don't interact
20+
with. In the second category of services are systems that general end
21+
users of Void interact with and are more likely to know about or
22+
recognize.
23+
24+
## Public Services
25+
26+
We'll first talk about public services that are broadly available to
27+
both maintainers and general consumers of Void Linux. These are
28+
almost, but not entirely, web based services that are accessed via a
29+
browser. See how many of these services you recognize.
30+
31+
### Void's Website
32+
33+
Void's website (the one you are reading right now) is a GitHub pages
34+
Jekyll site. This content is checked into `git` rendered by a worker
35+
process in the GitHub network, and then published to a CDN where you
36+
can read it. Additionally the Jekyll software produces feeds suitable
37+
for consumption in an RSS reader. The website is probably our
38+
simplest service and the easiest to copy on your own since it requires
39+
no special infrastructure, just a GitHub account to setup.
40+
41+
### Void Mirrors
42+
43+
Void's mirrors are simple nginx webservers that host static copies of
44+
all our software. This also includes some other sites that include
45+
content that we host ourselves, such as the [docs
46+
site](https://docs.voidlinux.org) and the dedicated [infrastructure
47+
docs site](https://infradocs.voidlinux.org). We host these from our
48+
own system since they both use mdbook, which is not as straightforward
49+
to use with a hosting service like GitHub Pages. We also run these
50+
sites this way so that they are broadly copied in the event of a
51+
failure in any of our systems. Did you know you can go to `/docs` on
52+
any mirror to read the Void handbook?
53+
54+
### Popcorn
55+
56+
Popcorn is a package statistics service that provides information
57+
about the popularity of packages as provided by systems that have
58+
opted in to have their package information reported. Though we are
59+
evaluating ways to replace the data provide by Popcorn, it still
60+
provides good real-world data on package installs. You can learn more
61+
about Popcorn [in the
62+
handbook](https://docs.voidlinux.org/contributing/index.html#usage-statistics).
63+
64+
### Source Site
65+
66+
The Sources Site (<https://sources.voidlinux.org>) provides a copy of
67+
all the sources as our build servers consumed them. This provides a
68+
way for us to quickly and easily make sure that we have the same
69+
source to troubleshoot a bad build with when finding the fault may
70+
require more than just the build error logs.
71+
72+
### xq-api
73+
74+
Some functionality on our website requires the ability to query the
75+
Void repository data. This is accomplished by fronting the repository
76+
data by a service called `xq-api` which provides query functionality
77+
on top of the repodata files. The data is refreshed frequently, so
78+
new packages quickly show up in the website search results as well as
79+
making sure that packages that are no longer available in our repos
80+
are removed promptly.
81+
82+
### Old Wiki Snapshot
83+
84+
At one time prior to the introduction of our docs site, Void
85+
maintained a MediaWiki instance. While MediaWiki is extremely
86+
powerful software and is a great choice for hosting a wiki, Void found
87+
that our wiki was being slowly filled with hyper-specific guides, lots
88+
of abandoned pages, and lower quality versions of pages that exist on
89+
the [Arch Linux Wiki](https://wiki.archlinux.org/). While we ported
90+
over a large number of pages to the docs that remained generally
91+
applicable, we also felt it was important to archive the entire wiki
92+
as it appeared before releasing the resources powering it. This was
93+
accomplished using a wiki crawler which could convert the wiki itself
94+
into an archive format that we now serve with kiwix server. You can
95+
find that old content at <https://wiki.voidlinux.org> should it
96+
interest you.
97+
98+
### Online Man Pages
99+
100+
Void makes available a copy of all the contents of our man page
101+
database online so users can easily search for commands even when not
102+
on a Void enabled system, such as during install time when internet
103+
access may not be available yet from a Void device. This service
104+
involves a task which routinely extracts the man pages from all
105+
packages using a program that is specific to XBPS, and then the files
106+
are arranged on disk to be served by the `mdocml` man page server,
107+
which is a program we obtain from OpenBSD. You can browse our online
108+
manuals at <https://man.voidlinux.org>.
109+
110+
## Services that help Maintainers
111+
112+
Not all services are meant for public consumption. A number of Void's
113+
services are meant to help maintainers be more productive, produce
114+
build artifacts, or generally make our workflows easier to accomplish.
115+
116+
### Build Pipeline
117+
118+
The build pipeline was discussed in detail in [another
119+
post](/news/2023/02/1-new-repo-fastly.html), but we'll recap that post
120+
here. In general there are a handful of powerful servers that we run
121+
automated compiler tasks on that run `xbps-src` whenever the contents
122+
of `void-packages` is updated. Once the packages are built, they are
123+
collected to a central point, signed cryptographically to attest that
124+
they are in fact packages produced by Void, and then they are copied
125+
out to mirrors around the world for users to download.
126+
127+
The build pipeline is the single largest collection of moving parts
128+
within our infrastructure, and is usually the component that breaks
129+
the most often as it has many exciting failure modes. Some of the
130+
author's favorites include running out of disk, stuck connection poll
131+
loops, and rsync just wandering off instead of synchronizing packages.
132+
133+
### Email
134+
135+
Void maintainers have access to email on the voidlinux.org domain. To
136+
provide this service, Void runs an email server. We make use of
137+
[maddy](https://maddy.email) which provides a convenient all in one
138+
mail server. It works well at our scale, and doesn't require a
139+
significant amount of maintainer time to make work. Though most of us
140+
access the mail using a combination of desktop and CLI clients, we
141+
also run a copy of the [Alps](https://git.sr.ht/~migadu/alps) web
142+
frontend which allows quick and easy access to mail when away from
143+
normal console services.
144+
145+
### DevSpace
146+
147+
Sometimes when preparing a fix or updating a package, a maintainer
148+
will want to share this new built artifact with others to gather
149+
feedback or see if the fix works. To enable this quickly and easily,
150+
we have a dedicated webserver and SFTP share box for these files. You
151+
can see things we're currently working on or haven't yet cleaned up at
152+
<https://devspace.voidlinux.org/> where the files are organized by
153+
maintainer.
154+
155+
Sometimes end users will be asked to fetch a build from devspace when
156+
filing an issue ticket to verify that a particular fix works, or that
157+
a given problem continues to exist when rebuilding a package or disk
158+
image from clean sources.
159+
160+
### void-robot and void-fleet
161+
162+
Void's team communicates primarily via IRC. In order to allow our
163+
infrastructure to communicate with us, we have a pair of IRC bots that
164+
inform us of status changes. The more chatty of the bots,
165+
`void-robot` tells us when PRs change status or when references change
166+
on Void's many git repos. This allows us to know when changes are
167+
going out, and its not uncommon for a maintainer to just ping someone
168+
else with a single `^` to gesture at a push or reference the bot has
169+
printed to the channel.
170+
171+
The second bot speaks on behalf of our monitoring infrastructure and
172+
notifies us when things break or when they're resolved. We'll take a
173+
deeper look at monitoring in a future post and look more at what this
174+
bot does then.
175+
176+
### Nomad, Consul & Vault
177+
178+
Many of Void's more modern services run on top of containers managed
179+
by Hashicorp Nomad. These services retrieve secrets from Hashicorp
180+
Vault, and can locate each other using Hashicorp Consul. The use of
181+
these tools allows us to largely abstract out what provider any given
182+
software is running on and where it resides in the world. This also
183+
makes it much easier when we need to replace a host or take one down
184+
for maintenance without interrupting access to user facing services.
185+
186+
The use of well understood tools like the Hashistack also makes it
187+
much easier for us to subdivide systems and check components locally.
188+
189+
### NetAuth
190+
191+
With all these services, it would be inconvenient for maintainers to
192+
need to maintain separate usernames and passwords for everything. In
193+
order to avoid this, we use Single Sign On concepts where all services
194+
that support it reach out to a centralized secure authentication
195+
service. You can read more about NetAuth at <https://netauth.org>.
196+
197+
198+
## How Does All This Get Run?
199+
200+
For some of Void's older services, notably the build farm itself, our
201+
services are configured, provisioned, and maintained using Ansible
202+
just like the underlying OS configuration. This works well, but has
203+
some drawbacks in being difficult to test, difficult to change in an
204+
idempotent way, and difficult to explain to others since its firmly
205+
the realm of infrastructure engineering. Trying to explain to someone
206+
how a hundred lines of yaml gets converted into a working webserver
207+
requires detours through a number of other assorted technologies.
208+
209+
Void's newer services run uniformly as containers and are managed by
210+
Nomad. This enables us to dynamically move workloads around, have
211+
machines self-heal and update in coordination with the fleet, and to
212+
provide a lens into our infrastructure for people to see. You can
213+
explore all our running containers in a limited read-only context by
214+
looking at the [nomad dashboard](https://nomad.voidlinux.org). Before
215+
you go trying to open a security notice though, we're aware that
216+
buttons that shouldn't be visible look like they're clickable. Rest
217+
assured that the anonymous policy that provides the view access can't
218+
actually stop jobs or drain nodes (we've reported this UI bug a few
219+
times already).
220+
221+
What Nomad does under the hood is actually really clever. It assesses
222+
what we want to run, and what resources we have available to run it.
223+
It then applies any constraints we've set on the services themselves.
224+
These constraints encode information like requiring locality to a
225+
particular disk in the fleet, or requiring that two copies of a
226+
service reside on different hosts. This then gets converted into a
227+
plan of what services will run where, and the workload of applications
228+
is distributed to all machines in the fleet. If a server fails to
229+
check in periodically, the workload on it is considered "lost" and can
230+
be restarted elsewhere if allowed. When we need to move between
231+
providers or update hardware, Nomad provides a way for us to quickly
232+
and easily work out how much of a machine we're actually consuming as
233+
well as actually performing the movement of the services from one
234+
location to another.
235+
236+
While Nomad is very clever and makes a lot of things much easier, we
237+
do still have a number of services that run directly on the Void
238+
system installed to the machines. For services that run on top of the
239+
metal directly we almost always use runit to supervise the tasks and
240+
restart them when they crash. This works well, but does tightly
241+
couple the service to the machine on which it is installed, and
242+
requires coordination with Ansible to make sure that restarts happen
243+
when they are supposed to during maintenance activities. For services
244+
that run in containers, we can simply set the restart policy on the
245+
container and allow the runtime to supervise the services as well as
246+
any cascading restarts that need to happen, such as when certificates
247+
are renewed or rotated.
248+
249+
In general, all our services have at least one layer of service
250+
supervision in the form of Void's runit-based init system, and in many
251+
cases more application specific level supervision occurs, often with
252+
status checks to validate and check assumptions made about the
253+
readiness of a service.
254+
255+
---
256+
257+
This has been day two of Void's infrastructure week. Check back
258+
tomorrow to learn about how we know that the services we run are up,
259+
and how we verify that once they're up, they're behaving as expected..
260+
This post was authored by `maldridge` who runs most of the day to day
261+
operations of the Void fleet. Feel free to ask questions on [GitHub
262+
Discussions](https://github.com/void-linux/void-packages/discussions/45099)
263+
or in [IRC](https://web.libera.chat/?nick=Guest?#voidlinux).

0 commit comments

Comments
 (0)