Commit 14ef95b
kernel/fork: group allocation/free of per-cpu counters for mm struct
A trivial execve scalability test which tries to be very friendly
(statically linked binaries, all separate) is predominantly bottlenecked
by back-to-back per-cpu counter allocations which serialize on global
locks.
Ease the pain by allocating and freeing them in one go.
Bench can be found here:
http://apollo.backplane.com/DFlyMisc/doexec.c
$ cc -static -O2 -o static-doexec doexec.c
$ ./static-doexec $(nproc)
Even at a very modest scale of 26 cores (ops/s):
before: 133543.63
after: 186061.81 (+39%)
While with the patch these allocations remain a significant problem,
the primary bottleneck shifts to page release handling.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20230823050609.2228718-3-mjguzik@gmail.com
[Dennis: reflowed 1 line]
Signed-off-by: Dennis Zhou <dennis@kernel.org>1 parent c439d5e commit 14ef95b
1 file changed
Lines changed: 4 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
909 | 909 | | |
910 | 910 | | |
911 | 911 | | |
912 | | - | |
913 | | - | |
914 | 912 | | |
915 | 913 | | |
916 | 914 | | |
| |||
925 | 923 | | |
926 | 924 | | |
927 | 925 | | |
| 926 | + | |
928 | 927 | | |
929 | | - | |
930 | | - | |
931 | 928 | | |
932 | 929 | | |
933 | 930 | | |
| |||
1252 | 1249 | | |
1253 | 1250 | | |
1254 | 1251 | | |
1255 | | - | |
1256 | | - | |
1257 | 1252 | | |
1258 | 1253 | | |
1259 | 1254 | | |
| |||
1301 | 1296 | | |
1302 | 1297 | | |
1303 | 1298 | | |
1304 | | - | |
1305 | | - | |
1306 | | - | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
1307 | 1302 | | |
1308 | 1303 | | |
1309 | 1304 | | |
1310 | 1305 | | |
1311 | 1306 | | |
1312 | 1307 | | |
1313 | | - | |
1314 | | - | |
1315 | 1308 | | |
1316 | 1309 | | |
1317 | 1310 | | |
| |||
0 commit comments