Skip to content

fix(eval): align synthetic density occupancy handling#268

Merged
marcuscollins merged 2 commits into
mainfrom
michaelanzuoni/issue-248-density-script-consistency
Jun 23, 2026
Merged

fix(eval): align synthetic density occupancy handling#268
marcuscollins merged 2 commits into
mainfrom
michaelanzuoni/issue-248-density-script-consistency

Conversation

@xraymemory

@xraymemory xraymemory commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • rename synthetic density occupancy internals/CLI toward occupancy_* while keeping occ_* aliases
  • share occupancy-value validation with synthetic structure-factor generation
  • warn on legacy PDB inputs while keeping backward compatibility

Fixes #248

Validation

  • uvx ty check src/sampleworks/eval/generate_synthetic_density.py tests/eval/test_generate_synthetic_density.py
  • uvx ruff check src/sampleworks/eval/generate_synthetic_density.py tests/eval/test_generate_synthetic_density.py
  • PYTHONPATH=src uvx --python 3.12 --with pytest --with loguru --with torch --with joblib --with numpy --with biotite --with atomworks[ml]==2.1.1 --with einx --with hydra-core --with omegaconf --with jax pytest --noconftest tests/eval/test_generate_synthetic_density.py -q

Summary by CodeRabbit

Release Notes

  • New Features

    • Standardized occupancy CLI options and CSV column naming (occupancy-mode / occupancy-values) while still accepting legacy equivalents (occ-mode / occ-values and legacy CSV columns).
    • Improved occupancy validation and file extension handling for clearer input requirements.
  • Tests

    • Added coverage to ensure both canonical and legacy occupancy columns are parsed correctly.
    • Added assertions for validation failures (including occupancy not summing to 1.0) and unsupported file extensions.

Copilot AI review requested due to automatic review settings June 22, 2026 16:51
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32a08dcc-189b-4696-9bff-e8ef9268a226

📥 Commits

Reviewing files that changed from the base of the PR and between 6b11747 and ba9ab49.

📒 Files selected for processing (1)
  • src/sampleworks/eval/generate_synthetic_density.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/sampleworks/eval/generate_synthetic_density.py

📝 Walkthrough

Walkthrough

generate_synthetic_density.py renames occ_values/occ_mode to occupancy_values/occupancy_mode across BatchRow, _process_single_row, process_batch, and main. VALID_EXTENSIONS is narrowed to .cif/.mmcif; LEGACY_EXTENSIONS (.pdb/.ent) is added with a runtime warning. Occupancy validation is delegated to validate_occupancy_values. CLI and CSV parsing retain backward-compatible legacy aliases. A new test file adds four BatchRow tests.

Changes

Occupancy naming and validation refactor

Layer / File(s) Summary
BatchRow contract: field rename, extension constants, and validation delegation
src/sampleworks/eval/generate_synthetic_density.py
VALID_EXTENSIONS narrowed to .cif/.mmcif; LEGACY_EXTENSIONS added with warning path; dataclass field renamed to occupancy_values; __post_init__ delegates to validate_occupancy_values imported from synthetic_utils; from_dict falls back to legacy occ_values CSV key.
Processing function signatures and call-site updates
src/sampleworks/eval/generate_synthetic_density.py
_process_single_row and process_batch parameter occ_mode renamed to occupancy_mode; all invocations in main updated; row.occupancy_values field access updated; joblib delayed/Parallel import moved to function-local scope in process_batch.
CLI argument parser: new flags with legacy aliases
src/sampleworks/eval/generate_synthetic_density.py
parse_args adds --occupancy-mode and --occupancy-values as canonical options; --occ-mode and --occ-values are retained as aliases sharing dest="occupancy_mode" and dest="occupancy_values".
BatchRow pytest coverage
tests/eval/test_generate_synthetic_density.py
Four tests added: from_dict with occupancy_values column, from_dict with legacy occ_values column, ValueError when occupancy values don't sum to 1.0, and ValueError for unsupported file extension.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • diff-use/sampleworks#75: Modifies the same occupancy-related parameters (occ_mode/occ_values) in generate_synthetic_density.py, overlapping directly with the renamed fields and logic in this PR.

Suggested reviewers

  • k-chrispens
  • marcuscollins

Poem

🐇 Hop hop, the names are clean,
occ_values is no longer seen!
occupancy_values takes the stage,
Legacy aliases kept for old CSV age.
The rabbit stamps the tests with glee —
One naming convention, finally free! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: aligning occupancy handling terminology and validation in the synthetic density module for consistency.
Linked Issues check ✅ Passed All three objectives from issue #248 are fully addressed: variable naming improved (occ_* → occupancy_*) [#248], validation refactored using validate_occ_values() [#248], and file extension validation enhanced with VALID_EXTENSIONS and LEGACY_EXTENSIONS [#248].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #248 objectives; no unrelated modifications detected in the refactoring of terminology, validation logic, and file extension handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch michaelanzuoni/issue-248-density-script-consistency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns generate_synthetic_density.py’s occupancy naming/validation behavior with the synthetic structure-factor generator, improving consistency across synthetic-data evaluation utilities while preserving legacy CLI/CSV aliases.

Changes:

  • Renames internal occupancy parameters/fields toward occupancy_* while retaining --occ-* CLI flags and occ_values CSV column support.
  • Centralizes occupancy-value validation by reusing validate_occupancy_values() from synthetic_utils.py.
  • Adds legacy-extension handling (VALID_EXTENSIONS vs LEGACY_EXTENSIONS) with warnings for PDB-like inputs and errors for unsupported extensions.
  • Adds focused tests for batch-row parsing and validation behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/sampleworks/eval/generate_synthetic_density.py Renames occupancy args/fields, reuses shared occupancy validation, and warns on legacy PDB extensions while keeping backward-compatible aliases.
tests/eval/test_generate_synthetic_density.py Adds tests covering canonical vs legacy CSV columns, occupancy sum validation, and extension validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/eval/test_generate_synthetic_density.py (1)

1-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add test coverage for legacy PDB extension handling with warning.

The PR objective includes "adding warnings for legacy PDB file formats while preserving backward compatibility." The current tests verify only that unsupported extensions (e.g., .txt) raise an error, but do not verify that legacy extensions (.pdb, .ent) are accepted with a warning. This backward-compatible behavior should be explicitly tested.

✓ Proposed test for legacy extension acceptance with warning
def test_batch_row_accepts_legacy_pdb_extension_with_warning(caplog) -> None:
    """Legacy .pdb files are accepted but emit a deprecation warning."""
    import logging
    caplog.set_level(logging.WARNING)
    
    row = BatchRow(filename="input.pdb")
    
    assert row.filename == "input.pdb"
    assert "legacy PDB format" in caplog.text
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/eval/test_generate_synthetic_density.py` around lines 1 - 34, The test
file is missing coverage for the backward-compatible behavior of accepting
legacy PDB extensions with warnings. Add a new test function after
test_batch_row_rejects_unsupported_extension that creates a BatchRow instance
with a legacy PDB filename (with `.pdb` or `.ent` extension), verifies the
BatchRow is created successfully with the correct filename, and asserts that a
deprecation warning containing text about legacy PDB format is emitted. Use
pytest's caplog fixture to capture and verify the warning message is logged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/eval/test_generate_synthetic_density.py`:
- Around line 1-34: The test file is missing coverage for the
backward-compatible behavior of accepting legacy PDB extensions with warnings.
Add a new test function after test_batch_row_rejects_unsupported_extension that
creates a BatchRow instance with a legacy PDB filename (with `.pdb` or `.ent`
extension), verifies the BatchRow is created successfully with the correct
filename, and asserts that a deprecation warning containing text about legacy
PDB format is emitted. Use pytest's caplog fixture to capture and verify the
warning message is logged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab650c4b-66bc-4c3a-a5c7-d9720eca521c

📥 Commits

Reviewing files that changed from the base of the PR and between 72f5118 and 6b11747.

📒 Files selected for processing (2)
  • src/sampleworks/eval/generate_synthetic_density.py
  • tests/eval/test_generate_synthetic_density.py

@manzuoni-astera

manzuoni-astera commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Validation update after the latest fix:

  • Local validation passed:
    • uvx ty check src/sampleworks/eval/generate_synthetic_density.py tests/eval/test_generate_synthetic_density.py
    • uvx ruff check src/sampleworks/eval/generate_synthetic_density.py tests/eval/test_generate_synthetic_density.py
    • PYTHONPATH=src uvx --python 3.12 --with pytest --with loguru --with torch --with joblib --with numpy --with biotite --with atomworks[ml]==2.1.1 --with einx --with hydra-core --with omegaconf --with jax pytest --noconftest tests/eval/test_generate_synthetic_density.py -q → 4 passed
    • same targeted pytest command without --with joblib → 4 passed
  • GitHub CI run 27974194712 is green for lint, tests, and typecheck across boltz-dev/protenix-dev/rf3-dev.
  • Remaining non-code status: GPU Tests run 27974193520 is still WAITING on its boltz-dev/protenix-dev/rf3-dev jobs, with no failure logs available yet.

@marcuscollins marcuscollins merged commit dd9bef8 into main Jun 23, 2026
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update generate_synthetic_density.py naming and validation to be consistent with generate_synthetic_sf.py

4 participants