Skip to content

ayyucedemirbas/scAnalyzer

Repository files navigation

scAnalyzer: A Single-Cell Analysis Toolkit

A Python toolkit for single-cell RNA sequencing (scRNA-seq) analysis.

🚧 Warning this project is under heavy development and not ready for production. ABI changes can happen frequently until reach stable version 🚧

GitHub Black isort

Package version

scAnalyzer is an integrated toolkit designed for scalable and memory-efficient single-cell RNA sequencing (scRNA-seq) data analysis. Built around a custom, highly optimized SingleCellDataset core, it seamlessly bridges foundational preprocessing with advanced downstream analyses, including trajectory inference, batch correction, and interactive 3D visualizations.

✨ Key Features

  • 📦 Memory-Efficient Core: Custom SingleCellDataset supporting sparse matrices (CSR/CSC) and HDF5 (.h5ad) I/O operations natively.
  • 🧹 Robust Preprocessing: Automated QC, MAD-based outlier detection, doublet prediction (via Scrublet), and cell-cycle scoring.
  • 🔄 Batch Correction: Built-in support for multiple integration algorithms including Harmony, ComBat, and MNN.
  • 🗺️ Dimensionality Reduction & Clustering: PCA, UMAP, t-SNE, PHATE, and Diffusion Maps. Supports graph-based (Leiden, Louvain) and distance-based clustering (K-Means, DBSCAN, Hierarchical).
  • 📊 Differential Expression: Highly vectorized, ultra-fast marker gene identification (t-test, Wilcoxon) and Gene Set Enrichment Analysis (Hypergeometric, GSEA).
  • 🛤️ Trajectory Inference: Dynamic cellular lineage tracking using Diffusion Pseudotime (DPT) with automated branch detection.
  • 🎨 Interactive Visualizations: Publication-ready static plots (Matplotlib/Seaborn) and dynamic, browser-based visualizations (Plotly 3D embeddings, interactive heatmaps).

🚀 Installation

Install the package directly from PyPI:

pip install scAnalysis

For interactive visualizations, ensure plotly is installed. For Leiden/Louvain clustering, leidenalg, louvain, and igraph are required.

💡 Quick Start

Here is a minimal example demonstrating a standard scRNA-seq workflow using scAnalyzer:

import scAnalysis as sca

1. Load Data

adata = sca.sc_io.read_10x_mtx('data/filtered_gene_bc_matrices/hg19')

2. Preprocessing & QC

sca.preprocessing.calculate_qc_metrics(adata, qc_vars=['MT-'])
adata = sca.preprocessing.filter_cells(adata, min_genes=200, max_pct_mito=5.0)
adata = sca.preprocessing.filter_genes(adata, min_cells=3)
sca.preprocessing.normalize_total(adata, target_sum=1e4)
sca.preprocessing.log1p(adata)
sca.preprocessing.highly_variable_genes(adata, n_top_genes=2000)

3. Dimensionality Reduction

sca.dimensionality.run_pca(adata, n_components=50)
sca.dimensionality.neighbors(adata, n_neighbors=10, n_pcs=40)
sca.dimensionality.run_umap(adata, min_dist=0.3)

4. Clustering & Differential Expression

sca.clustering.cluster_leiden(adata, resolution=0.5, key_added='leiden')
sca.differential.rank_genes_groups(adata, groupby='leiden', method='t-test')

5. Visualization

sca.visualization.plot_umap(adata, color='leiden', save='umap_clusters.png')
sca.visualization.plot_dotplot(adata, var_names=['CD3E', 'MS4A1', 'CD14'], groupby='leiden')

🏗️ Architecture & Modules

The framework is highly modular, allowing you to use only the components you need:

scAnalysis.core: Contains the base SingleCellDataset data structure.

scAnalysis.preprocessing: Filtering, normalization, and HVG selection.

scAnalysis.quality_control: Scrublet doublet detection and outlier filtering.

scAnalysis.dimensionality: PCA, UMAP, t-SNE, DiffMap, PHATE.

scAnalysis.clustering: K-Means, Leiden, Louvain, Spectral, DBSCAN.

scAnalysis.differential: Vectorized stats for marker discovery.

scAnalysis.enrichment: Gene set scoring, MSigDB integration, GSEA.

scAnalysis.trajectory: Root cell selection, DPT, branching.

scAnalysis.visualization: Static plotting (Violin, Dotplot, Heatmap, Volcano).

scAnalysis.interactive_viz: Plotly-powered interactive UI.

scAnalysis.sc_io: Native 10x MTX, CSV, and .h5ad read/write support.

🧪 Testing

The package includes a comprehensive suite of unit tests. To run the tests locally:

python -m unittest discover scAnalysis/ -p "test_*.py"

🤝 Contributing

Contributions are welcome! If you find a bug or want to suggest a new feature, please open an issue or submit a pull request.

🤖 Future Enhancements / To-Do List

  • Implement Imputation Module (Dropout Handling)

    • Context: The current scAnalysis package lacks a dedicated module to handle missing data and technical dropouts.

    • Task: Develop an imputation workflow to infer missing values and correct for zero-inflation.

    • References: Investigate integrating or replicating methodologies like SAVER (Poisson LASSO strategy) or scVI (Variational Autoencoders).

  • Add Automated Cell Type Annotation & Projection

    • Context: Currently, cell type assignment relies on a manual, marker-based approach using gene set scoring (enrichment.py).

    • Task: Implement automated, classifier-based annotation tools that can predict cell types directly from reference datasets.

    • References: Consider integrating projection algorithms like scmap or regularized regression classifiers like Garnett.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors