metaphlan

metaphlan packages MetaPhlAn for TAFFISH.

Package identity:

name: metaphlan
command: taf-metaphlan
kind: tool
version: 4.2.4-r1
license: Apache-2.0
upstream: https://github.com/biobakery/MetaPhlAn

What This App Packages

MetaPhlAn is a bioBakery tool for species-level taxonomic profiling from shotgun metagenomic reads. The upstream package also includes StrainPhlAn and helper scripts for strain-level population genomics and profile post-processing.

This TAFFISH app provides a fixed MetaPhlAn 4.2.4 runtime from Bioconda. It does not bundle the production ChocoPhlAn marker database; use --db_dir to point MetaPhlAn at a database directory prepared outside the image.

Scope

This app supports:

metaphlan profiling for FASTQ/FASTA/SAM/mapout-style inputs supported by upstream MetaPhlAn 4.2.4
short-read Bowtie2 mapping and long-read Minimap2 profiling through upstream options
StrainPhlAn helper commands such as sample2markers.py and strainphlan
profile utilities such as merge_metaphlan_tables.py, metaphlan2krona.py, and sgb_to_gtdb_profile.py

This app does not:

include a production MetaPhlAn/ChocoPhlAn database inside the container
download databases during smoke tests or normal wrapper startup
replace a full metagenomics workflow, read QC, host depletion, or HUMAnN analysis

Container Contents

metaphlan: default upstream command
strainphlan, sample2markers.py: StrainPhlAn workflow commands
merge_metaphlan_tables.py, metaphlan2krona.py, sgb_to_gtdb_profile.py: profile conversion and merge utilities
bowtie2, minimap2, samtools, blastn, Rscript: runtime dependencies used by MetaPhlAn and its helper scripts

The image stores package inventories under /opt/metaphlan/share/doc/metaphlan/.

Database Setup

MetaPhlAn 4.2 uses --db_dir for the local database directory. Older upstream examples may mention --bowtie2db; for this packaged version, prefer --db_dir.

Prepare a project-local database directory:

mkdir -p metaphlan-db
taf-metaphlan metaphlan --install --db_dir "$PWD/metaphlan-db"

For reproducible production work, keep the database directory with the project or use a controlled shared path, record the exact database index reported by MetaPhlAn, and pass it with --index:

taf-metaphlan reads.fq.gz \
  --input_type fastq \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  --nproc 8 \
  --mapout sample.mapout.bz2 \
  -o sample.profile.tsv

Database download requires network access at the time you run --install. Normal profiling can run offline after the database is present.

Usage

Access default upstream help:

taf-metaphlan -- --help

Profile reads:

taf-metaphlan reads.fq.gz \
  --input_type fastq \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  -o profile.tsv

Profile long reads:

taf-metaphlan longreads.fq.gz \
  --input_type fastq \
  --long_reads \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  -o profile.tsv

Reuse a saved mapping file:

taf-metaphlan sample.mapout.bz2 \
  --input_type mapout \
  --db_dir "$PWD/metaphlan-db" \
  --index <database-index> \
  -o profile.tsv

Merge profile tables:

taf-metaphlan merge_metaphlan_tables.py sample1.profile.tsv sample2.profile.tsv > merged.tsv

Run an explicit packaged command:

taf-metaphlan strainphlan --help
taf-metaphlan sample2markers.py --help

Command Mode

TAFFISH tool apps support automatic command mode. For non-option leading commands, taf-metaphlan merge_metaphlan_tables.py ... runs that executable inside the same container.

Use taf-metaphlan -- --help, taf-metaphlan -- --version, or taf-metaphlan metaphlan --install ... when the first upstream argument starts with -.

Inputs

Input	Meaning	Notes
FASTQ/FASTA reads	metagenomic reads	use `--input_type fastq`, `fasta`, or another upstream-supported type
SAM/mapout	saved mapping output	MetaPhlAn 4.2 uses `--input_type mapout` for saved mapout files
database directory	MetaPhlAn marker database	pass with `--db_dir`; the database is external to the image

Output Notes

The main output is a MetaPhlAn profile table with clades, taxonomic identifiers, relative abundances, and version/database header metadata. Optional outputs include saved mapping files (--mapout), BIOM-format profile output, Krona inputs, merged abundance tables, and StrainPhlAn marker/consensus outputs.

Resources, Databases, and Platform

MetaPhlAn production databases are large and versioned independently of this wrapper. Keep them outside the image, mount or work from a writable project directory, and pass the path with --db_dir. Real runs may need substantial memory and disk space, especially during database installation or large multi-sample analyses.

This package declares linux/amd64 only because the current Bioconda package metadata does not advertise additional platforms for metaphlan. On Apple Silicon or other arm64 hosts, Docker/Podman can run it through amd64 emulation.

Boundaries

The smoke test verifies the packaged runtime, key helper commands, Python/R and native dependency availability, a tiny FASTQ read-statistics path, table merging, and a controlled missing-database failure. It intentionally does not download the MetaPhlAn database or validate taxonomic accuracy on real metagenomes.

Troubleshooting

If MetaPhlAn reports that a database is missing, run taf-metaphlan metaphlan --install --db_dir <dir> or pass the correct existing --db_dir and --index.
If an old tutorial uses --bowtie2db, translate that option to --db_dir for MetaPhlAn 4.2.
If an old tutorial uses --bowtie2out, translate the saved mapping workflow to --mapout and --input_type mapout.
If a long-read run fails, confirm --long_reads is present and inspect metaphlan --help for current Minimap2-related thresholds.

Testing

The smoke test covers:

wrapper metadata and help
MetaPhlAn 4.2.4 runtime identity
key helper commands and aligner/runtime dependencies
a small offline functional path that does not need the production database
an explicit missing-database error path

It does not replace full scientific validation on production datasets.

License and Citation

TAFFISH app packaging: Apache-2.0.

Upstream MetaPhlAn is MIT-licensed. Cite MetaPhlAn according to upstream guidance, including the MetaPhlAn 4 Nature Biotechnology paper and the StrainPhlAn/MetaPhlAn method papers when applicable:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docker		docker
docs		docs
src		src
target		target
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
taffish.toml		taffish.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metaphlan

What This App Packages

Scope

Container Contents

Database Setup

Usage

Command Mode

Inputs

Output Notes

Resources, Databases, and Platform

Boundaries

Troubleshooting

Testing

License and Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

metaphlan

What This App Packages

Scope

Container Contents

Database Setup

Usage

Command Mode

Inputs

Output Notes

Resources, Databases, and Platform

Boundaries

Troubleshooting

Testing

License and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages