PolicyEngine · MaxGhenis · Jun 19, 2026 · Jun 19, 2026
diff --git a/changelog.d/longwise-local-geography.md b/changelog.d/longwise-local-geography.md
@@ -0,0 +1 @@
+- Add an OA-first long-format local geography weights artifact (`local_geography_weights.csv.gz`) so UK constituency and local-authority consumers can migrate away from dense area-by-household weight matrices.
diff --git a/docs/oa_calibration_pipeline.md b/docs/oa_calibration_pipeline.md
@@ -132,15 +132,19 @@ Generate per-area H5 files from sparse L0-calibrated weights.
 
 **Deliverables:**
 - `policyengine_uk_data/calibration/publish_local_h5s.py` — extracts per-area H5 subsets from the sparse weight vector; each H5 contains only active households (non-zero weight) with their calibrated weights, plus the linked person and benunit rows
+- `policyengine_uk_data/calibration/long_geography.py` — exports matrix-free local geography weights as an OA-first long table, with constituency and LA rows derived from assigned OA geography
 - `datasets/create_datasets.py` — publish step wired in after calibration, before downrating
 - `tests/test_publish_local_h5s.py` — 13 tests covering area-household mapping, H5 structure, pruned-household exclusion, weight correctness, person/benunit FK integrity, full publish cycle, summary statistics, and validation
 
 **Key design:**
 - `_get_area_household_indices()`: maps each area code to its household row indices via OA geography columns from clone-and-assign
+- `write_long_geography_weights()`: writes `storage/local_geography_weights.csv.gz`, a long sidecar with `area_type`, `area_code`, household identifiers, source-year/source-household provenance, and weights; the production build writes assigned-geography rows, while explicit 2D H5 conversion is available only for small compatibility checks because expanding dense area-by-household matrices is too large for routine builds
+- `geography_support_report()`: summarizes low-support areas using unique source households and effective sample size, so clone count and future pooled-FRS builds can be evaluated without mistaking cloned rows for independent evidence
 - `publish_area_h5()`: writes a single H5 per area — filters to active (non-zero weight) households, extracts linked persons and benunits via FK joins, stores as HDF5 groups with metadata attributes
 - `publish_local_h5s()`: orchestrates the full publish cycle — loads L0 weight vector, iterates over all areas, writes H5 files to `storage/local_h5s/{area_type}/`, produces `_summary.csv` with per-area statistics
 - `validate_local_h5s()`: post-publish validation checking file existence, HDF5 structure, and cross-area household ID uniqueness
 - Supports both constituency (650) and LA (360) area types
 - Zero-weight households (L0-pruned) are excluded from area H5 files — only active records are published
+- The legacy `parliamentary_constituency_weights.h5` and `local_authority_weights.h5` artifacts are still produced during migration; new consumers should prefer the OA-first `local_geography_weights.csv.gz` sidecar.
 
 **US reference:** PR #465 (modal)
diff --git a/policyengine_uk_data/calibration/clone_and_assign.py b/policyengine_uk_data/calibration/clone_and_assign.py
@@ -20,7 +20,6 @@
 
 from policyengine_uk_data.calibration.oa_assignment import (
     assign_random_geography,
-    GeographyAssignment,
 )
 
 logger = logging.getLogger(__name__)
@@ -98,8 +97,6 @@ def clone_and_assign(
     benunit = dataset.benunit
 
     n_households = len(hh)
-    n_persons = len(person)
-    n_benunits = len(benunit)
 
     logger.info(
         "Cloning %d households x %d = %d total records",
@@ -192,6 +189,9 @@ def clone_and_assign(
 
         # Clone household table
         hh_clone = hh.copy()
+        hh_clone["source_household_id"] = hh_id_col
+        if "source_year" not in hh_clone.columns:
+            hh_clone["source_year"] = dataset.time_period
         hh_clone["household_id"] = new_hh_ids
         hh_clone["household_weight"] = hh["household_weight"].values / n_clones
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		- Add an OA-first long-format local geography weights artifact (`local_geography_weights.csv.gz`) so UK constituency and local-authority consumers can migrate away from dense area-by-household weight matrices.