Split monolithic docs/api.md into per-module reference pages
Problem
docs/api.md is ~1,900 lines of hand-maintained API reference. As the API grows this becomes increasingly hard to navigate and keep in sync with the source.
Prerequisite: docstring linting must land first
Before starting the mkdocstrings migration, ruff + pydoclint should be set up and passing in CI. Doing it in the wrong order means writing new NumPy-style docstrings in an unenforced environment and potentially having to fix them again later.
The planned approach is a dedicated branch to assess the blast radius before committing to fixes:
- Add ruff with
D rules and convention = "numpy" to pyproject.toml
- Run and triage —
D100/D101/D102/D103 (missing docstrings) will likely be noisy and can be deferred to extend-ignore until docstrings are actually written
- Add
pydoclint for parameter/return consistency checks (ruff checks docstring formatting; pydoclint checks that documented parameters match the actual function signature)
- Fix violations or document ignore rationale
- Wire both into CI
- Document NumPy docstring style as the project standard in
CONTRIBUTING.md
Note: ruff only covers .py and .pyi files. Rust /// doc comments on PyO3-exposed items also need to follow NumPy style (so they render correctly if stubs are ever auto-generated), but no tool enforces this automatically — it is a contributor convention enforced by code review.
Proposed solution
1. Establish NumPy-style docstrings as the project standard
The codebase already uses NumPy-style docstrings in places (e.g. ParseHandler). This should be formalised as the project-wide standard before migrating to auto-generation, so that all docstrings are consistent and mkdocstrings can be configured once.
This applies to:
- All Python source files (
Parameters\n----------, Returns\n-------, etc.)
cifflow_core.pyi — the hand-written stub currently uses plain prose docstrings; these should be updated to NumPy style
- Rust
/// doc comments on PyO3-exposed items (#[pyclass], #[pyfunction]) — must also use NumPy style so they render correctly if stubs are ever auto-generated
Key gotcha: NumPy style requires parameter : (space before colon). parameter: (no space) is silently mis-parsed by mkdocstrings and drops the parameter from generated output. Enforce this via ruff (D rules, NumPy convention) or pydoclint in CI.
2. Split into per-module files
Each top-level domain gets its own page, mirroring the module layout:
docs/
api/
index.md # quick-reference table: symbol → one-liner → link
types.md
parser.md
model.md
builder.md
writer.md
clean.md
output.md
dictionary.md
ingestion.md
fidelity.md
validation.md
inspect.md
visualise.md
3. Switch to auto-generation with MkDocs + mkdocstrings
Migrate prose from api.md into docstrings, then let the toolchain generate reference pages. This removes the dual-maintenance problem.
- Python: MkDocs + mkdocstrings + Material theme
- PyO3 extension: mkdocstrings reads
cifflow_core.pyi directly; no auto-generation tooling needed unless the stub surface grows significantly
- Configure mkdocstrings for NumPy style:
# mkdocs.yml
plugins:
- mkdocstrings:
handlers:
python:
options:
docstring_style: numpy
Reference pages become:
# CIF model
::: cifflow.cifmodel.builder.build
::: cifflow.cifmodel.builder.CifBuilder
4. Keep hand-written content where auto-gen falls short
Some content in the current api.md is too rich for docstrings alone and should be preserved as hand-written sections above the auto-generated symbol reference:
- Behavioural guarantees table
- Presence-state encoding tables (
CifValue, ColumnDef)
EmitMode block-partitioning rules
OutputPlan / BlockSpec interaction semantics
5. Documentation policy
- Exported symbols → full reference page (signature, parameters, return, behavioural notes, example) with NumPy-style docstring
- Internal modules (
lexer.py, duckdb_ingest.py, textfield.py, etc.) → inline comments only, no reference page
- Deprecations surfaced inline with the relevant symbol, not just in
DdlmItem fields
Out of scope
This issue is reference docs only. Guides, tutorials, and how-to content are separate.
Acceptance criteria
Split monolithic
docs/api.mdinto per-module reference pagesProblem
docs/api.mdis ~1,900 lines of hand-maintained API reference. As the API grows this becomes increasingly hard to navigate and keep in sync with the source.Prerequisite: docstring linting must land first
Before starting the mkdocstrings migration, ruff + pydoclint should be set up and passing in CI. Doing it in the wrong order means writing new NumPy-style docstrings in an unenforced environment and potentially having to fix them again later.
The planned approach is a dedicated branch to assess the blast radius before committing to fixes:
Drules andconvention = "numpy"topyproject.tomlD100/D101/D102/D103(missing docstrings) will likely be noisy and can be deferred toextend-ignoreuntil docstrings are actually writtenpydoclintfor parameter/return consistency checks (ruff checks docstring formatting; pydoclint checks that documented parameters match the actual function signature)CONTRIBUTING.mdNote: ruff only covers
.pyand.pyifiles. Rust///doc comments on PyO3-exposed items also need to follow NumPy style (so they render correctly if stubs are ever auto-generated), but no tool enforces this automatically — it is a contributor convention enforced by code review.Proposed solution
1. Establish NumPy-style docstrings as the project standard
The codebase already uses NumPy-style docstrings in places (e.g.
ParseHandler). This should be formalised as the project-wide standard before migrating to auto-generation, so that all docstrings are consistent and mkdocstrings can be configured once.This applies to:
Parameters\n----------,Returns\n-------, etc.)cifflow_core.pyi— the hand-written stub currently uses plain prose docstrings; these should be updated to NumPy style///doc comments on PyO3-exposed items (#[pyclass],#[pyfunction]) — must also use NumPy style so they render correctly if stubs are ever auto-generatedKey gotcha: NumPy style requires
parameter :(space before colon).parameter:(no space) is silently mis-parsed by mkdocstrings and drops the parameter from generated output. Enforce this viaruff(Drules, NumPy convention) orpydoclintin CI.2. Split into per-module files
Each top-level domain gets its own page, mirroring the module layout:
3. Switch to auto-generation with MkDocs + mkdocstrings
Migrate prose from
api.mdinto docstrings, then let the toolchain generate reference pages. This removes the dual-maintenance problem.cifflow_core.pyidirectly; no auto-generation tooling needed unless the stub surface grows significantlyReference pages become:
# CIF model ::: cifflow.cifmodel.builder.build ::: cifflow.cifmodel.builder.CifBuilder4. Keep hand-written content where auto-gen falls short
Some content in the current
api.mdis too rich for docstrings alone and should be preserved as hand-written sections above the auto-generated symbol reference:CifValue,ColumnDef)EmitModeblock-partitioning rulesOutputPlan/BlockSpecinteraction semantics5. Documentation policy
lexer.py,duckdb_ingest.py,textfield.py, etc.) → inline comments only, no reference pageDdlmItemfieldsOut of scope
This issue is reference docs only. Guides, tutorials, and how-to content are separate.
Acceptance criteria
CONTRIBUTING.md)cifflow_core.pyidocstrings updated to NumPy stylerufforpydoclintNumPy docstring linting added to CIdocstring_style: numpyand builds without errorsdocs/api.mdis removed