GitHub - AI-sandbox/hprc-pclai: Point Cloud Local Ancestry Inference (PCLAI) on HPRC Release 2 samples

Point Cloud Local Ancestry Inference (PCLAI) — HPRC Release 2

Quick links: Read the Manual: PCLAI Manual v.0.1 | Reference PCA space (PC1–PC2 + metadata): Reference PCA Metadata | Index files: GRCh38 - CHM13 - Assembly | Official PCLAI repo: PCLAI Code | Read the Preprint: biorxiv

PCLAI is a deep learning-based approach for inferring continuous population genetic structure along the genome. Instead of assigning each genomic window to a discrete ancestry label, PCLAI predicts a continuous coordinate (e.g., a point in PC1–PC2 space) for every window, together with a per-window confidence score.

What PCLAI provides

For each genomic window (1000 SNPs), PCLAI outputs:

Continuous coordinates per window
A low-dimensional coordinate (e.g., (PC1, PC2)) representing where that window lies in a reference genetic space.
Confidence score per window
A value in [0, 1000] where higher = more confident. We filter out very low-confidence predictions in the distributed BED files.

PCLAI is naturally a regression method in a coordinate space. For HPRC Release 2, coordinates are reported in PCA space as a default surrogate for genetic distance.

How HPRC Release 2 results were generated (high level)

Reference embedding: Construct a reference PCA embedding (from 1000 Genomes using the Reference PCA Metadata).
Windows: Split each haplotype into fixed windows of 1000 SNPs.
Inference: Predict a coordinate for each window in the reference PCA space.
Confidence: Output a confidence score per window for QC / filtering.

Discrete ancestry labeling is optional: you can bin coordinates into categories after the fact, but the primary output is continuous. If you require PCLAI discretization for downstream tasks, consult our Manual.

If you require impainting missing windows for downstream tasks, refer to our recommendation in our Manual.

Output format (BED)

We provide local ancestry results as BED, which works well in genome browsers and supports interval coloring via itemRgb.

Field	Description
`chrom`	Chromosome
`chromStart`	Window start (0-based, inclusive)
`chromEnd`	Window end (0-based, exclusive)
`name`	`{sample}/{hap}/{chrom}_wXXXX_(x,y)` where `(x,y)` are the predicted coordinates (e.g., `(PC1,PC2)`)
`score`	Confidence score in [0,1000] (higher = more confident)
`strand`	`.`
`thickStart`	equals `chromStart`
`thickEnd`	equals `chromEnd`
`itemRgb`	`R,G,B` color derived from the predicted coordinate (exported as RGB; generated from a perceptual mapping)
`centroid`	Discretized PCLAI annotation of the window corresponding to the ancestry centroid

Example BED row:

chr1    14486   805864  HG00097/h1/chr1_w0001_(0.438,-1.398)    991 .   14486   805864  222,162,255

Visualization tip: itemRgb lets you color each window by position in the embedding (e.g., mapping a 2D coordinate into a perceptual color space → RGB), so continuous shifts along the genome are visually apparent.

Can I train my own PCLAI model?

Yes! If you want to train PCLAI on your own data, follow the steps in our official PCLAI repo.

Cite

When using the PCLAI method or PCLAI outputs, please cite the following paper:

@article{geleta_pclai_2026,
    author = {Geleta, Margarita and Mas Montserrat, Daniel and Ioannidis, Nilah M. and Ioannidis, Alexander G.},
    title = {{Point cloud local ancestry inference (PCLAI): continuous coordinate-based ancestry along the genome}},
    year = {2026},
    journal = {biorxiv},
    doi={10.64898/2026.03.23.713813}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
figures		figures
README.md		README.md
pclai_manual.pdf		pclai_manual.pdf
reference_pca_metadata.tsv		reference_pca_metadata.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Point Cloud Local Ancestry Inference (PCLAI) — HPRC Release 2

What PCLAI provides

How HPRC Release 2 results were generated (high level)

Output format (BED)

Can I train my own PCLAI model?

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Point Cloud Local Ancestry Inference (PCLAI) — HPRC Release 2

What PCLAI provides

How HPRC Release 2 results were generated (high level)

Output format (BED)

Can I train my own PCLAI model?

Cite

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages