Branch vs. Branchless

Program execution is faster and more efficient when microprocessors can continuously fetch instructions from the cache. This is particularly important in High-Performance Computing (HPC) and Real-Time Systems (RTS), where execution times have to be predictable and not fluctuate. In the event of a cache miss, new instructions have to be fetched from RAM to repopulate the cache, causing performance to suffer.

The more predictable the execution path of a program, the better for performance. Branchless programming is a coding technique to minimize the number of possible execution branches. Think of if statements executing one part of the program or another; when the predicted code is in one branch, but the other branch needs to run, this causes inefficiencies and delays.

Here we take a look at two versions of a function to determine which of two numbers is the largest. The first version uses execution branches with an if statement, and the other version uses branchless programming. The code was tested on a MacBook Pro running macOS Sequoia 15.5 with an M1 Max and on a PC running Ubuntu 24.04 with an Intel Core i7-7567U.

You can reproduce the experiment yourself and share your observations. We will see how to do that in the Run it yourself section. The following table shows the execution times (in milliseconds) with different levels of compiler optimization.

Function	Architecture	No Optimization	-O1	-O2	-O3	-Os
branch	M1 Max	41794.074	0.005	0.003	0.003	0.003
branchless	M1 Max	37546.447	0.001	0.001	0.001	0.001
branch	Corei7-7567U	35539.531	258.027	0.005	0.005	0.005
branchless	Corei7-7567U	54218.671	507.746	0.004	0.004	0.004

On macOS with Apple Silicon, the branchless code consistently outperforms the branched code across all levels of compiler optimization. The same is not observed on the Ubuntu PC with an Intel processor. The branchless code is only faster when compiled with -O2 or higher levels of optimization.

Run it yourself

Clone or download this repository.

Create a directory for the binary files:

mkdir bin

Compile the code:

gcc -std=c2x branches.c -o bin/branches

Run the program:

./bin/branches

Repeat compilation and running with different levels of compiler optimization:

gcc -std=c2x -O1 branches.c -o bin/branches
./bin/branches

gcc -std=c2x -O2 branches.c -o bin/branches
./bin/branches

gcc -std=c2x -O3 branches.c -o bin/branches
./bin/branches

gcc -std=c2x -Os branches.c -o bin/branches
./bin/branches

You can also take a look at the assembly code for each function for different optimization levels. Although the branchless code may contain more instructions, it avoids jumps. Fetching and running instructions from the cache is faster than a cache miss requiring fetching new instructions from RAM.

gcc -std=c2x -S branches.c -o branches.asm
gcc -std=c2x -S -O1 branches.c -o branches_O1.asm
gcc -std=c2x -S -O2 branches.c -o branches_O2.asm
gcc -std=c2x -S -O3 branches.c -o branches_O3.asm
gcc -std=c2x -S -Os branches.c -o branches_Os.asm

License

Apache License, Version 2.0. See the LICENSE file for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
branches.c		branches.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Branch vs. Branchless

Run it yourself

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Branch vs. Branchless

Run it yourself

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages