Skip to content

docs/api.md is monolithic #42

@rowlesmr

Description

@rowlesmr

Split monolithic docs/api.md into per-module reference pages

Problem

docs/api.md is ~1,900 lines of hand-maintained API reference. As the API grows this becomes increasingly hard to navigate and keep in sync with the source.

Prerequisite: docstring linting must land first

Before starting the mkdocstrings migration, ruff + pydoclint should be set up and passing in CI. Doing it in the wrong order means writing new NumPy-style docstrings in an unenforced environment and potentially having to fix them again later.

The planned approach is a dedicated branch to assess the blast radius before committing to fixes:

  1. Add ruff with D rules and convention = "numpy" to pyproject.toml
  2. Run and triage — D100/D101/D102/D103 (missing docstrings) will likely be noisy and can be deferred to extend-ignore until docstrings are actually written
  3. Add pydoclint for parameter/return consistency checks (ruff checks docstring formatting; pydoclint checks that documented parameters match the actual function signature)
  4. Fix violations or document ignore rationale
  5. Wire both into CI
  6. Document NumPy docstring style as the project standard in CONTRIBUTING.md

Note: ruff only covers .py and .pyi files. Rust /// doc comments on PyO3-exposed items also need to follow NumPy style (so they render correctly if stubs are ever auto-generated), but no tool enforces this automatically — it is a contributor convention enforced by code review.

Proposed solution

1. Establish NumPy-style docstrings as the project standard

The codebase already uses NumPy-style docstrings in places (e.g. ParseHandler). This should be formalised as the project-wide standard before migrating to auto-generation, so that all docstrings are consistent and mkdocstrings can be configured once.

This applies to:

  • All Python source files (Parameters\n----------, Returns\n-------, etc.)
  • cifflow_core.pyi — the hand-written stub currently uses plain prose docstrings; these should be updated to NumPy style
  • Rust /// doc comments on PyO3-exposed items (#[pyclass], #[pyfunction]) — must also use NumPy style so they render correctly if stubs are ever auto-generated

Key gotcha: NumPy style requires parameter : (space before colon). parameter: (no space) is silently mis-parsed by mkdocstrings and drops the parameter from generated output. Enforce this via ruff (D rules, NumPy convention) or pydoclint in CI.

2. Split into per-module files

Each top-level domain gets its own page, mirroring the module layout:

docs/
  api/
    index.md        # quick-reference table: symbol → one-liner → link
    types.md
    parser.md
    model.md
    builder.md
    writer.md
    clean.md
    output.md
    dictionary.md
    ingestion.md
    fidelity.md
    validation.md
    inspect.md
    visualise.md

3. Switch to auto-generation with MkDocs + mkdocstrings

Migrate prose from api.md into docstrings, then let the toolchain generate reference pages. This removes the dual-maintenance problem.

  • Python: MkDocs + mkdocstrings + Material theme
  • PyO3 extension: mkdocstrings reads cifflow_core.pyi directly; no auto-generation tooling needed unless the stub surface grows significantly
  • Configure mkdocstrings for NumPy style:
# mkdocs.yml
plugins:
  - mkdocstrings:
      handlers:
        python:
          options:
            docstring_style: numpy

Reference pages become:

# CIF model

::: cifflow.cifmodel.builder.build
::: cifflow.cifmodel.builder.CifBuilder

4. Keep hand-written content where auto-gen falls short

Some content in the current api.md is too rich for docstrings alone and should be preserved as hand-written sections above the auto-generated symbol reference:

  • Behavioural guarantees table
  • Presence-state encoding tables (CifValue, ColumnDef)
  • EmitMode block-partitioning rules
  • OutputPlan / BlockSpec interaction semantics

5. Documentation policy

  • Exported symbols → full reference page (signature, parameters, return, behavioural notes, example) with NumPy-style docstring
  • Internal modules (lexer.py, duckdb_ingest.py, textfield.py, etc.) → inline comments only, no reference page
  • Deprecations surfaced inline with the relevant symbol, not just in DdlmItem fields

Out of scope

This issue is reference docs only. Guides, tutorials, and how-to content are separate.

Acceptance criteria

  • NumPy docstring style documented as the project standard (e.g. in CONTRIBUTING.md)
  • All public Python docstrings use NumPy style
  • cifflow_core.pyi docstrings updated to NumPy style
  • ruff or pydoclint NumPy docstring linting added to CI
  • MkDocs + mkdocstrings configured with docstring_style: numpy and builds without errors
  • All currently-documented public symbols appear in the new structure
  • docs/api.md is removed
  • CI fails if docs don't build

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions