Skip to content

Fix shatter memory DoS by writing per-leaf results#153

Closed
ClayWarren wants to merge 2 commits intohobuinc:mainfrom
ClayWarren:main
Closed

Fix shatter memory DoS by writing per-leaf results#153
ClayWarren wants to merge 2 commits intohobuinc:mainfrom
ClayWarren:main

Conversation

@ClayWarren
Copy link
Copy Markdown

Motivation

  • A change made do_one() to return full per-leaf DataFrames and run() to accumulate them in joined_dfs and pd.concat, allowing unbounded memory growth for large/adversarial inputs.
  • The goal is to avoid retaining all per-leaf data in memory while preserving the existing per-leaf processing and total point accounting.

Description

  • Updated src/silvimetric/commands/shatter.py so do_one() now returns an int point count and writes each leaf immediately instead of returning a DataFrame.
  • do_one() now sorts the per-leaf joined data by ['xi','yi'] and calls write(...) to persist the leaf and return the written point count, and returns 0 for skipped/empty leaves.
  • run() no longer accumulates DataFrames; it aggregates per-leaf point counts via point_count from Dask futures or compute(...) results and updates config.point_count accordingly.
  • Removed the pd.concat(joined_dfs) path to eliminate unbounded in-memory aggregation while preserving deterministic per-leaf ordering.

Testing

  • Ran python -m compileall src/silvimetric/commands/shatter.py which succeeded and verified the modified file compiles.

@ClayWarren ClayWarren closed this by deleting the head repository Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant