This repository contains my solution for Advent Of FPGA 2025. I chose to solve day 3 and implemented my solution in OCaml using the Hardcaml library
The following tools are required:
opam(≥ 2.5)dune(installed withopam install dune)
From the project root, run:
opam switch create . ocaml.5.3.0
eval $(opam env)
opam install . --deps-only
dune buildAfter seting up the project, running
dune exec bin/day3.exeshould print:
172740584266849
The core problem in Day 3 (Part Two) is:
- Given a sequence of digits, select exactly K = 12 digits, in order, such that the resulting number is as large as possible.
The algorithm below implements a greedy, stack-based solution that constructs the optimal number incrementally as the input is scanned from left to right.
def max_number_k_digits(s: str, K: int) -> int:
stack = []
n = len(s)
for i, c in enumerate(s):
d = int(c)
remaining = n - i
while (
stack and
stack[-1] < d and
len(stack) - 1 + remaining >= K
):
stack.pop()
if len(stack) < K:
stack.append(d)
return int("".join(map(str, stack)))- Scan digits left to right
- Maintain the lexicographically largest subsequence from the digits seen so far.
- If a new digit is larger than the previous one, try to replace it
- Only pop digits if we are guaranteed to still be able to reach K digits total
A more hardware friendly python implementation of this algorithm can be found in python_reference/solve_hardware_friendly.py
In the software reference implementation, the stack is pruned using a while loop that repeatedly pops elements until the greedy condition is satisfied. This form of control flow does not map well to hardware, as it would require variable-length iteration and could stall the pipeline.
Instead, the hardware design computes the new stack pointer in a single clock cycle, using combinational logic.
For each stack index i, three conditions are evaluated in parallel:
-
- Digit comparison:
Whether the incoming digit is strictly greater than the stored digit:
stack[i] < data_in - Digit comparison:
-
- Space constraint:
Whether enough input remains to still reach K = depth digits if this entry is removed:
remaining >= depth - i
- Space constraint:
-
- Index contrains:
Any index at or above the current stack pointer is always considered valid:
i >= sp
- Index contrains:
These conditions are combined into a bit vector:
let valid_pop = (gt_data_in &: space_ok) |: idx_gte_spThe new stack pointer is then computed by counting the number of trailing 1s in this vector:
let trailing_ones =
Aofpga.Count_trailing_ones.create (uresize valid_pop 16)The trailing_ones count subtracted from K = depth yields the lowest index in the stack that should be popped, allowing the stack pointer to be updated directly without iterative popping.
The end of an input line is detected when a remaining-input counter reaches zero:
let end_of_line = (remaining ==:. 0) &: ~:(stack.empty)At this point:
- The stack contains the final 12-digit result for the line
- A conversion from BCD digits to binary is triggered
- The stack pointer is reset for the next line
The selected digits are stored in BCD form. Conversion to binary is performed by a multi-cycle sequential unit that processes one digit per clock cycle:
let line_ans = Aofpga.Bcd_to_binary.create
~clock
~digits:stack.mem
~start:end_of_lineThis conversion does not stall input processing, which allows the design to maintain a throughput of one input digit per cycle, even though BCD-to-binary conversion itself is sequential.
Once the conversion finishes, the resulting binary value is accumulated into the final answer:
let ans =
reg_fb
spec
~enable:accumelate_ans
~width:ans_width
~f:(fun q ->
q +: (uresize line_ans.result ans_width)
)This design implements a greedy maximum-subsequence algorithm in hardware by computing stack pruning decisions in a single cycle using parallel comparisons. Multi-cycle BCD-to-binary conversion is decoupled from input processing, allowing continuous throughput of one digit per clock cycle while accumulating results across lines.