Advent Of FPGA 2025 (Day 3)

This repository contains my solution for Advent Of FPGA 2025. I chose to solve day 3 and implemented my solution in OCaml using the Hardcaml library

Requirements

The following tools are required:

opam (≥ 2.5)
dune (installed with opam install dune)

Project Setup

From the project root, run:

opam switch create . ocaml.5.3.0
eval $(opam env)
opam install . --deps-only
dune build

Running The Solution

After seting up the project, running

dune exec bin/day3.exe

should print:

172740584266849

Algorithm Overview

The core problem in Day 3 (Part Two) is:

Given a sequence of digits, select exactly K = 12 digits, in order, such that the resulting number is as large as possible.

Reference Software Algorithm (Python)

The algorithm below implements a greedy, stack-based solution that constructs the optimal number incrementally as the input is scanned from left to right.

def max_number_k_digits(s: str, K: int) -> int:
    stack = []
    n = len(s)

    for i, c in enumerate(s):
        d = int(c)
        remaining = n - i

        while (
            stack and
            stack[-1] < d and
            len(stack) - 1 + remaining >= K
        ):
            stack.pop()

        if len(stack) < K:
            stack.append(d)

    return int("".join(map(str, stack)))

Scan digits left to right
Maintain the lexicographically largest subsequence from the digits seen so far.
If a new digit is larger than the previous one, try to replace it
Only pop digits if we are guaranteed to still be able to reach K digits total

A more hardware friendly python implementation of this algorithm can be found in python_reference/solve_hardware_friendly.py

Key Hardware Design Decisions

Single-Cycle Stack Pointer Pruning

In the software reference implementation, the stack is pruned using a while loop that repeatedly pops elements until the greedy condition is satisfied. This form of control flow does not map well to hardware, as it would require variable-length iteration and could stall the pipeline.

Instead, the hardware design computes the new stack pointer in a single clock cycle, using combinational logic.

For each stack index i, three conditions are evaluated in parallel:

1. Digit comparison:
  Whether the incoming digit is strictly greater than the stored digit:
```
stack[i] < data_in
```
1. Space constraint:
  Whether enough input remains to still reach K = depth digits if this entry is removed:
```
remaining >= depth - i
```
1. Index contrains:
  Any index at or above the current stack pointer is always considered valid:
```
i >= sp
```

These conditions are combined into a bit vector:

let valid_pop = (gt_data_in &: space_ok) |: idx_gte_sp

The new stack pointer is then computed by counting the number of trailing 1s in this vector:

let trailing_ones =
  Aofpga.Count_trailing_ones.create (uresize valid_pop 16)

The trailing_ones count subtracted from K = depth yields the lowest index in the stack that should be popped, allowing the stack pointer to be updated directly without iterative popping.

End-of-Line Detection and Pipeline Decoupling

The end of an input line is detected when a remaining-input counter reaches zero:

let end_of_line = (remaining ==:. 0) &: ~:(stack.empty)

At this point:

The stack contains the final 12-digit result for the line
A conversion from BCD digits to binary is triggered
The stack pointer is reset for the next line

Parallel BCD-to-Binary Conversion

The selected digits are stored in BCD form. Conversion to binary is performed by a multi-cycle sequential unit that processes one digit per clock cycle:

let line_ans = Aofpga.Bcd_to_binary.create
  ~clock
  ~digits:stack.mem
  ~start:end_of_line

This conversion does not stall input processing, which allows the design to maintain a throughput of one input digit per cycle, even though BCD-to-binary conversion itself is sequential.

Accumulation of Results

Once the conversion finishes, the resulting binary value is accumulated into the final answer:

let ans =
  reg_fb
    spec
    ~enable:accumelate_ans
    ~width:ans_width
    ~f:(fun q ->
      q +: (uresize line_ans.result ans_width)
    )

Summary

This design implements a greedy maximum-subsequence algorithm in hardware by computing stack pruning decisions in a single cycle using parallel comparisons. Multi-cycle BCD-to-binary conversion is decoupled from input processing, allowing continuous throughput of one digit per clock cycle while accumulating results across lines.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
lib		lib
python_reference		python_reference
test		test
.gitignore		.gitignore
README.md		README.md
aofpga.opam		aofpga.opam
day3.txt		day3.txt
dune-project		dune-project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advent Of FPGA 2025 (Day 3)

Requirements

Project Setup

Running The Solution

Algorithm Overview

Reference Software Algorithm (Python)

Key Hardware Design Decisions

Single-Cycle Stack Pointer Pruning

End-of-Line Detection and Pipeline Decoupling

Parallel BCD-to-Binary Conversion

Accumulation of Results

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advent Of FPGA 2025 (Day 3)

Requirements

Project Setup

Running The Solution

Algorithm Overview

Reference Software Algorithm (Python)

Key Hardware Design Decisions

Single-Cycle Stack Pointer Pruning

End-of-Line Detection and Pipeline Decoupling

Parallel BCD-to-Binary Conversion

Accumulation of Results

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages