Releases: saforem2/ezpz
Releases Β· saforem2/ezpz
v0.12.7
v0.12.7
4 May 2026
- Merge pull request #129 from saforem2/yeet-refactor
426a204 - docs(yeet): collapse uv-Python warning, hoist Complete workflow up, fill in missing venv-create step
6898630 - docs(yeet): expand uv-Python workaround with HPC module + standalone options
386e578 - fix(docs/yeet): drop fictional 'uv venv --copies' flag
9c1c8d8 - docs(yeet): warn about uv-managed Python; track proper fix in TODO Β§17
9127828 - chore(.gitignore): exclude .venv.tar.gz
1b9d6cd - fix(tar-env): land tarball next to the venv, return absolute path
7049dc8 - docs(yeet): correct the misleading 'incremental syncs' section
1c3347f - docs(yeet): drop the synthetic 4-node example output block
6851e7b - docs(yeet): advocate the tar-env + yeet pair as the scaling default
49cd14d - docs(yeet): update scaling figures to match the canonical pair
9030f92 - docs: escape brackets in admonition titles and tab labels
05cfa1d - docs(zensical): tidy CLI nav order, drop noisy validation warning
f0ddd4a - docs(cli/index): restructure command list with hierarchy + footnote
91b19bf - docs(nav): actually add ezpz tar-env to zensical sidebar
48941b6 - docs(nav): add ezpz tar-env to zensical sidebar
75c5195 - fix(tar-env): keep -v (verbose) flag β print files as they're added
22f2dd0 - fix(tar-env): actually gzip the .tar.gz output
49d747f - feat(yeet,kill): use ezpz.get_logger for timestamped output
fc76fd1 - feat(yeet): hint when a same-named .tar.gz exists nearby
168cbef - fix(yeet): probe rsync for --info=progress2 support, fall back on macOS
76b0f6c - fix(yeet,kill): always pass list (not None) to run() from Click
7795511 - docs(reference): replace hand-coded ANSI HTML with plain code fences
848e03c - docs(test): refresh stale logger paths in cli/test.md
c62a0e8 - docs: address audit findings (#2, #4, #5, #6)
89d8793 - docs(cli): promote yeet, kill, tar-env out of the experimental block
0a19a67 - docs(yeet): switch scaling plots to linear axes
f4eef10 - docs(yeet): match the original split-chart style
fd98514 - docs(yeet): restore scaling guidelines on the split SVGs
9ea0a69 - docs(yeet): split scaling chart into two separate SVGs
7b673ae - docs(yeet): use the scaling SVG from the companion blog post
4c8be0b - docs(yeet): linear y-axis on scaling plot
d43a114 - docs(yeet): polish β better lead, translucent palette, single combined SVG
8399675 - docs(yeet): color greedy fan-out diagram by generation
8a43b61 - chore(todo): add Β§15 (ZeRO-1 in wrap_model) and Β§16 (explicit DeepSpeed wrapper)
e0a1f46 - test(yeet,kill): strengthen coverage from review
e1a29a8 - docs(kill): capitalize Python (proper noun)
db9db34 - fix(yeet): generic-source footer counts only successful nodes
315200d - fix(kill): correct success accounting + drop StrictHostKeyChecking
01bd7c6 - docs(fsdp): make memory-block widths proportional to actual GB
cfac686 - docs(yeet): add first-step latency column to Aurora scaling table
0275cb1 - docs(yeet): add Aurora 8β4096 node scaling results
b44479d - docs(kill): document
ezpz killcommande8fe05d - feat(kill): add
ezpz killfor cleaning up stuck distributed jobs04cd14f - docs(yeet): rename yeet-env.md β yeet.md, document positional + generic source
b1af0df - feat(yeet): support arbitrary directory sources, not just venvs
9e6117b - feat(yeet): accept positional SRC argument
f085179 - refactor(yeet): rename yeet-env β yeet, keep deprecated alias
d55c38a - fix(utils.sh): use log_message in ezpz_save_pbs_env "to calculate" block
dbaeaed - chore(todo): add Β§14 β bin/utils.sh cleanup items
5ddc52f
v0.12.6
v0.12.5
v0.12.4
v0.12.3
v0.12.2
v0.12.1
v0.12.0
v0.12.0
14 April 2026
- Merge pull request #122 from saforem2/dev
9a5fab9 - chore: Move bench_trackers.sh into scripts/
bfa95f6 - docs(guide): Use prefixed keys, logger, and finalize in DDP example
8d2b645 - fix(docs): Clarify that RNG seeding is opt-in via seed= parameter
21f9d11 - docs(TODO): Add tracker follow-up items
74c2b61 - fix(tracker): Improve MLflow error messages, suppress per-step 403 spam
ff5aa15 - fix(tracker): Fix MLflow auth patch and experiment naming
59f636b - revert(tracker): Restore MLFLOW_TRACKING_TOKEN guard for system metrics
413b59a - docs(history): Document grouped finalize() output and return type
45f5505 - fix(history): Log grouped datasets clearly from finalize()
7b42891 - fix(dist): Work around xccl split_group regression in PyTorch nightly
a5fd044 - refactor(history): Save per-group datasets without NaN padding
a74d0d2 - fix(history): Include group prefix in plot titles and filenames
ca816f0 - refactor(history): Group metrics by prefix for independent plot axes
30824c5 - fix(tracker): Enable MLflow system metrics unconditionally
d103a43 - docs(quickstart): Update Next Steps with new pages and labels
ef485f5 - docs(index): Update overview and features for new functionality
1d1200b - fix(dist): Use explicit empty-string check for env var lookups
b59d981 - fix(docs): Add missing nullcontext import in gradient accumulation recipe
d6e926c - fix(log): Reset time/date styles to empty defaults
e39de96 - fix(docs): Remove broken anchor link to deleted reference.md section
d6e4fa1 - docs: Merge tracker guide into experiment tracking page
75c9ec3 - docs(recipes): Add data loading, checkpointing, and gradient accumulation
a57e2b6 - docs(quickstart): Use FSDP default, update cross-links
34aa13b - docs: Rename "Complete Example" to "End-to-End Walkthrough"
8d31c46 - docs(config): Add common configurations section and distributed training guide
770b48b - docs(architecture): Simplify dist.py shim description, add guide link
43aacbe - docs(troubleshooting): Add distributed hang and FSDP error sections
0b84eb8 - style(docs): Update font families and add mermaid diagram styles
3b8521e - fix(docs): Correct cclβxccl backend names, fix bracket syntax
90b85be - chore(log): Remove commented-out style lines
f08c5e8 - feat(tracker): Support EZPZ_TRACKERS shorthand env var alias
ae7823a - fix(submit): Default --launch to on, add --no-launch to opt out
dc70989 - fix(init): Add semver-safe get_torch_version_tuple() and use it
f6248e8 - fix(log): Close single-bracket even when no prefix components render
d6a4812 - fix(history): Fix and/or precedence in _tracker_got_config check
fec5fa5 - feat(benchmark): Add --run alias with comma-separated example names
a60f602 - feat(report): Add MLflow links, fix table column alignment
f819015 - docs(quickstart): Add linenums to code blocks and fix line wrapping
207bbb0 - fix(history): Fall back to EZPZ_TRACKER_BACKEND env var and fix formatting
37938c2 - style(log): Adjust day/time and repr.colon styles for readability
e0202c4 - style(tracker): Color mlflow label bright red in stderr output
9999e03 - fix(pbs): Cache qstat results to avoid redundant calls during launch
c91a078 - fix(dist): Respect pre-set MASTER_ADDR/MASTER_PORT in DDP setup
6251373 - fix(tests): Scrub stale distributed env vars in FSDP-TP launch test
a20b9b5 - fix(tests): Skip MPI tests on non-HPC and prevent hangs
2a09676 - fix(pbs): Retry qstat on transient PBS server errors
ac44ff9 - fix(pbs): Stop sh import error from polluting test output
d8b271e - fix: Address review findings from PR #122
f27156b - fix(log): Swap day_color and time_color styles
5ffd71b - fix(tracker): Improve MLflow dotenv error handling
776977c - feat(dist): Add setup_mlflow() convenience function
1191120 - fix(log): Improve log timestamp visibility
d13774d - docs: Fix README default, consolidate backend docs, rename Reference
cf2a440 - fix(tests): Accept AssertionError in PBS nodefile path test
773a75f - docs(recipes): Add MLflow tracking recipe
7231d16 - docs(tracker): Document MLflow as built-in backend with full setup guide
a069108 - feat(dist): Add --fsdp-sharding-strategy CLI arg and reshard_after_forward
a8fa27f - fix(dist): Skip ModuleList/ModuleDict in _wrap_fsdp2
569b164 - docs: Add timing comparison table stub to tracker docs
89623a6 - chore: Add tracker backend benchmark script
fc94e85 - feat(dist): Make FSDP2 (fully_shard) the default in wrap_model
f8f4c38 - test(log): Add tests for log config env vars and prefix styles
f1e7aae - feat(tra...
v0.11.3
v0.11.3
29 March 2026
- chore(nav): Restructure nav, add Recipes, promote FAQ to Guide
1935686 - test(recipes): Add tests for docs recipe code snippets
58fe520 - docs(history): Add metric tracking guide for History class
536a5bc - docs(quickstart): Add uv-run and ezpz-test verification sections
d33f03d - docs(faq): Add general FAQ section and fix HTML tag
8083783 - docs(index): Streamline homepage with better examples and try-it-out section
8306281 - fix(docs): Update CLI references from
ezpz-launchtoezpz launch601a81b - docs(recipes): Add Polaris output tabs from 2-node run
8f7cd5f - chore: Update
scripts/capture_recipe_outputs.sh30673c0 - docs(recipes): Add tabsets with runnable code + output
f2456b5 - chore: Add new filters for Aurora
aa22a39 - chore: Add new filters for Aurora
fe1e408 - chore: Update arguments in
examples/diffusion.py211fcbb - chore: Update
scripts/*330e269 - chore: Add new filters for Aurora
3acb752 - chore: Update arguments in
examples/diffusion.pyea06637 - chore: Update
scripts/*397a30f - chore: Update
scripts/run_benchmarks.py5354443 - chore: Update
scripts/run_benchmarks.pyc50a6a5 - fix(examples/hf): handle removed overwrite_output_dir attribute
2f852d5 - fix(scripts): remove deprecated --include-tokens-per-second flag
db99de2 - docs: remove duplicate guide.md and update nav
a008e4f - docs(faq): trim verbose MPI and launcher output
85e8ea2 - docs(examples): add example picker table and intro paragraph
52d3207 - docs(cli): reorganize command listing and rewrite launch page
906063a - docs(architecture): fix xccl backend name and add wrapping strategy table
b73c71b - fix(history): add detach() before numpy conversion
59782fa - chore: Align tables in
scripts/generate_report.pyee18997 - chore: Update
scripts/run_benchmarks.shed0e463 - docs(reference): annotate SequentialLinearNet import
faaef4b - docs(index): add target audience line
e01b538 - docs(examples): deepen import and preset annotations
0a475e9 - docs(includes): collapse MPI noise and remove orphaned files
15a6ea3 - docs: slim reference.md and update cross-links
9086a30 - docs(quickstart): migrate shell env, launcher examples, and API cheat sheet
59cc228 - fix(docs): use sequential numbered lists in reference and configuration
370d576 - fix: Fix missing import in
distributed.pyd9dc204 - feat(scripts): add benchmark runner and report generator
b0e7f27 - refactor(examples): initialise wandb earlier for full console capture
3c0532c - fix(log): forward redirect param in get_console to fix wandb log capture
7b6d437 - docs: use
tags in summary elements for inline code rendering0d4c4bc - docs: add markdown attr to walkthrough details for inline code rendering
e43fe11 - docs: collapse Code Walkthrough subsections by default
84988df - docs: rewrite walkthroughs to cover full source files top-to-bottom
05cf563 - docs: move collapsed Source sections to top of example pages
640e12c - docs: deprecate minimal example, add collapsed source to all examples
8e24879 - fix(distributed): fall back to DDP when FSDP is unsupported on CPU/MPS
f828d3c - fix(tests): update slurm test expectations for --gpus-per-node flag
11e5ec3 - docs: remove What to Expect sections, use direct source code
b5ee90b - docs: add diagrams, fix quickstart prose, reorder example nav
1ab7545 - docs: add walkthroughs and expected output to example pages
ebabc5c - docs: add HF causal LM example page with code walkthrough
64450f0 - docs: add minimal example page with code walkthrough
51e35f7 - docs: restructure nav, update README, and polish site
ca2787e - docs: add key API callouts to example pages
94bae50 - docs: add architecture, guide, troubleshooting, and perlmutter pages
6e54cfe - fix(utils.sh): load required modules for Perlmutter conda setup
3234016 - fix(distributed): zero-pad node index in print_dist_setup for alignment
e39fb85 - fix(slurm): skip (null) nodelist entries and add --gpus-per-node to srun
94bf48f - refactor(history): clean up log_metrics output
4640b5e - fix(history): use correct path in report log message
d0f427e - Merge branch 'wip' of https://github.com/saforem2/ezpz into wip
78a2ed2 - Merge branch 'wip' of https://github.com/saforem2/ezpz into wip
b219b3f - Merge branch 'wip' of https://github.com/saforem2/ezpz into wip
d6857cc - test(distributed,launch): fix tests that fail inside MPI jobs [
28154e7](281...
v0.11.2
v0.11.2
1 March 2026
- Merge pull request #119 from saforem2/dev
d260a27 - docs: Update docs
76af046 - chore: Update
examples/hf_trainer.py5624cb2 - docs: Update docs
bd932cb - chore: Update
pyproject.toml06a3858 - chore: Update
zensical.toml4e1fdd2 - docs: Update
includes/*a4b3bc8 - chore: Update
src/ezpz/__init__.py83de93a - chore: Update
src/ezpz/cli/flags.py0f668bd - chore: Update
src/ezpz/launch.pybc0af25 - chore: Update
src/ezpz/pbs.py1d0e214 - chore: Update
src/ezpz/test.py576820b - chore: Update
tests/40dfbb5 - chore: Update
ezpz/examples/*de2a724 - chore: Update
ezpz/examples/cria.pyb47d6ea - chore: catch empty train history gracefully
63e2c51 - chore: Update timings in
examples/vit.py4379632 - chore: Group timings in
examples/*.pyd39cf41 - chore: add
@ezpz.timeitlogitdecorators toexamples/*.pyd51941f - feat: Update
examples/hf.py30e6116 - chore: Update
examples/test.py,cli/flags.py09c0848 - chore: Update
src/ezpz/dist.pya4438a3 - chore: Track branch in
wandb.run.config7a5023b - feat: Add timings to
examples/*.py847b12c - chore: Update
src/ezpz/configs.py77abd53 - chore: Update
src/ezpz/dist.py9aaf67c - chore: Update JSON logger in
ezpz/launch.pyf4f451b - feat: Unified, consistent directory names in
examples/*.py5465be2 - feat: Unified, consistent directory names in
examples/*.py7542852 - Update
ezpz/log/formatters.py4d65ed2 - chore: Update
src/ezpz/dist.py768ef60 - chore: Update
src/ezpz/dist.py38e3378 - chore: Update
ezpz/dist.py19a4b70 - chore: Update
src/ezpz/log/formatters.py7f3a9dc - chore: Update
src/ezpz/cli/test_cmd.py029acef - chore: Update
src/ezpz/conf/hydra/job_logging/custom.yamld7a7286 - chore: Update
src/ezpz/configs.pyc740b0e - chore: Update
src/ezpz/launch.py64b2ce8 - chore: Update
src/ezpz/log/__init__.py93d7780