Skip to content

taffish/metaphlan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

metaphlan

metaphlan packages MetaPhlAn for TAFFISH.

Package identity:

What This App Packages

MetaPhlAn is a bioBakery tool for species-level taxonomic profiling from shotgun metagenomic reads. The upstream package also includes StrainPhlAn and helper scripts for strain-level population genomics and profile post-processing.

This TAFFISH app provides a fixed MetaPhlAn 4.2.4 runtime from Bioconda. It does not bundle the production ChocoPhlAn marker database; use --db_dir to point MetaPhlAn at a database directory prepared outside the image.

Scope

This app supports:

  • metaphlan profiling for FASTQ/FASTA/SAM/mapout-style inputs supported by upstream MetaPhlAn 4.2.4
  • short-read Bowtie2 mapping and long-read Minimap2 profiling through upstream options
  • StrainPhlAn helper commands such as sample2markers.py and strainphlan
  • profile utilities such as merge_metaphlan_tables.py, metaphlan2krona.py, and sgb_to_gtdb_profile.py

This app does not:

  • include a production MetaPhlAn/ChocoPhlAn database inside the container
  • download databases during smoke tests or normal wrapper startup
  • replace a full metagenomics workflow, read QC, host depletion, or HUMAnN analysis

Container Contents

  • metaphlan: default upstream command
  • strainphlan, sample2markers.py: StrainPhlAn workflow commands
  • merge_metaphlan_tables.py, metaphlan2krona.py, sgb_to_gtdb_profile.py: profile conversion and merge utilities
  • bowtie2, minimap2, samtools, blastn, Rscript: runtime dependencies used by MetaPhlAn and its helper scripts

The image stores package inventories under /opt/metaphlan/share/doc/metaphlan/.

Database Setup

MetaPhlAn 4.2 uses --db_dir for the local database directory. Older upstream examples may mention --bowtie2db; for this packaged version, prefer --db_dir.

Prepare a project-local database directory:

mkdir -p metaphlan-db
taf-metaphlan metaphlan --install --db_dir "$PWD/metaphlan-db"

For reproducible production work, keep the database directory with the project or use a controlled shared path, record the exact database index reported by MetaPhlAn, and pass it with --index:

taf-metaphlan reads.fq.gz \
  --input_type fastq \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  --nproc 8 \
  --mapout sample.mapout.bz2 \
  -o sample.profile.tsv

Database download requires network access at the time you run --install. Normal profiling can run offline after the database is present.

Usage

Access default upstream help:

taf-metaphlan -- --help

Profile reads:

taf-metaphlan reads.fq.gz \
  --input_type fastq \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  -o profile.tsv

Profile long reads:

taf-metaphlan longreads.fq.gz \
  --input_type fastq \
  --long_reads \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  -o profile.tsv

Reuse a saved mapping file:

taf-metaphlan sample.mapout.bz2 \
  --input_type mapout \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  -o profile.tsv

Merge profile tables:

taf-metaphlan merge_metaphlan_tables.py sample1.profile.tsv sample2.profile.tsv > merged.tsv

Run an explicit packaged command:

taf-metaphlan strainphlan --help
taf-metaphlan sample2markers.py --help

Command Mode

TAFFISH tool apps support automatic command mode. For non-option leading commands, taf-metaphlan merge_metaphlan_tables.py ... runs that executable inside the same container.

Use taf-metaphlan -- --help, taf-metaphlan -- --version, or taf-metaphlan metaphlan --install ... when the first upstream argument starts with -.

Inputs

Input Meaning Notes
FASTQ/FASTA reads metagenomic reads use --input_type fastq, fasta, or another upstream-supported type
SAM/mapout saved mapping output MetaPhlAn 4.2 uses --input_type mapout for saved mapout files
database directory MetaPhlAn marker database pass with --db_dir; the database is external to the image

Output Notes

The main output is a MetaPhlAn profile table with clades, taxonomic identifiers, relative abundances, and version/database header metadata. Optional outputs include saved mapping files (--mapout), BIOM-format profile output, Krona inputs, merged abundance tables, and StrainPhlAn marker/consensus outputs.

Resources, Databases, and Platform

MetaPhlAn production databases are large and versioned independently of this wrapper. Keep them outside the image, mount or work from a writable project directory, and pass the path with --db_dir. Real runs may need substantial memory and disk space, especially during database installation or large multi-sample analyses.

This package declares linux/amd64 only because the current Bioconda package metadata does not advertise additional platforms for metaphlan. On Apple Silicon or other arm64 hosts, Docker/Podman can run it through amd64 emulation.

Boundaries

The smoke test verifies the packaged runtime, key helper commands, Python/R and native dependency availability, a tiny FASTQ read-statistics path, table merging, and a controlled missing-database failure. It intentionally does not download the MetaPhlAn database or validate taxonomic accuracy on real metagenomes.

Troubleshooting

  • If MetaPhlAn reports that a database is missing, run taf-metaphlan metaphlan --install --db_dir <dir> or pass the correct existing --db_dir and --index.
  • If an old tutorial uses --bowtie2db, translate that option to --db_dir for MetaPhlAn 4.2.
  • If an old tutorial uses --bowtie2out, translate the saved mapping workflow to --mapout and --input_type mapout.
  • If a long-read run fails, confirm --long_reads is present and inspect metaphlan --help for current Minimap2-related thresholds.

Testing

The smoke test covers:

  • wrapper metadata and help
  • MetaPhlAn 4.2.4 runtime identity
  • key helper commands and aligner/runtime dependencies
  • a small offline functional path that does not need the production database
  • an explicit missing-database error path

It does not replace full scientific validation on production datasets.

License and Citation

TAFFISH app packaging: Apache-2.0.

Upstream MetaPhlAn is MIT-licensed. Cite MetaPhlAn according to upstream guidance, including the MetaPhlAn 4 Nature Biotechnology paper and the StrainPhlAn/MetaPhlAn method papers when applicable:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors