Quick links: Read the Manual: PCLAI Manual v.0.1 | Reference PCA space (PC1–PC2 + metadata): Reference PCA Metadata | Index files: GRCh38 - CHM13 - Assembly | Official PCLAI repo: PCLAI Code | Read the Preprint: biorxiv
PCLAI is a deep learning-based approach for inferring continuous population genetic structure along the genome. Instead of assigning each genomic window to a discrete ancestry label, PCLAI predicts a continuous coordinate (e.g., a point in PC1–PC2 space) for every window, together with a per-window confidence score.
For each genomic window (1000 SNPs), PCLAI outputs:
-
Continuous coordinates per window
A low-dimensional coordinate (e.g., (PC1, PC2)) representing where that window lies in a reference genetic space. -
Confidence score per window
A value in [0, 1000] where higher = more confident. We filter out very low-confidence predictions in the distributed BED files.
PCLAI is naturally a regression method in a coordinate space. For HPRC Release 2, coordinates are reported in PCA space as a default surrogate for genetic distance.
- Reference embedding: Construct a reference PCA embedding (from 1000 Genomes using the Reference PCA Metadata).
- Windows: Split each haplotype into fixed windows of 1000 SNPs.
- Inference: Predict a coordinate for each window in the reference PCA space.
- Confidence: Output a confidence score per window for QC / filtering.
Discrete ancestry labeling is optional: you can bin coordinates into categories after the fact, but the primary output is continuous. If you require PCLAI discretization for downstream tasks, consult our Manual.
If you require impainting missing windows for downstream tasks, refer to our recommendation in our Manual.
We provide local ancestry results as BED, which works well in genome browsers and supports interval coloring via itemRgb.
| Field | Description |
|---|---|
chrom |
Chromosome |
chromStart |
Window start (0-based, inclusive) |
chromEnd |
Window end (0-based, exclusive) |
name |
{sample}/{hap}/{chrom}_wXXXX_(x,y) where (x,y) are the predicted coordinates (e.g., (PC1,PC2)) |
score |
Confidence score in [0,1000] (higher = more confident) |
strand |
. |
thickStart |
equals chromStart |
thickEnd |
equals chromEnd |
itemRgb |
R,G,B color derived from the predicted coordinate (exported as RGB; generated from a perceptual mapping) |
centroid |
Discretized PCLAI annotation of the window corresponding to the ancestry centroid |
Example BED row:
chr1 14486 805864 HG00097/h1/chr1_w0001_(0.438,-1.398) 991 . 14486 805864 222,162,255Visualization tip: itemRgb lets you color each window by position in the embedding (e.g., mapping a 2D coordinate into a perceptual color space → RGB), so continuous shifts along the genome are visually apparent.
Yes! If you want to train PCLAI on your own data, follow the steps in our official PCLAI repo.
When using the PCLAI method or PCLAI outputs, please cite the following paper:
@article{geleta_pclai_2026,
author = {Geleta, Margarita and Mas Montserrat, Daniel and Ioannidis, Nilah M. and Ioannidis, Alexander G.},
title = {{Point cloud local ancestry inference (PCLAI): continuous coordinate-based ancestry along the genome}},
year = {2026},
journal = {biorxiv},
doi={10.64898/2026.03.23.713813}
}
