metaphlan packages MetaPhlAn for TAFFISH.
Package identity:
- name:
metaphlan - command:
taf-metaphlan - kind:
tool - version:
4.2.4-r1 - license: Apache-2.0
- upstream: https://github.com/biobakery/MetaPhlAn
MetaPhlAn is a bioBakery tool for species-level taxonomic profiling from shotgun metagenomic reads. The upstream package also includes StrainPhlAn and helper scripts for strain-level population genomics and profile post-processing.
This TAFFISH app provides a fixed MetaPhlAn 4.2.4 runtime from Bioconda. It
does not bundle the production ChocoPhlAn marker database; use --db_dir to
point MetaPhlAn at a database directory prepared outside the image.
This app supports:
metaphlanprofiling for FASTQ/FASTA/SAM/mapout-style inputs supported by upstream MetaPhlAn 4.2.4- short-read Bowtie2 mapping and long-read Minimap2 profiling through upstream options
- StrainPhlAn helper commands such as
sample2markers.pyandstrainphlan - profile utilities such as
merge_metaphlan_tables.py,metaphlan2krona.py, andsgb_to_gtdb_profile.py
This app does not:
- include a production MetaPhlAn/ChocoPhlAn database inside the container
- download databases during smoke tests or normal wrapper startup
- replace a full metagenomics workflow, read QC, host depletion, or HUMAnN analysis
metaphlan: default upstream commandstrainphlan,sample2markers.py: StrainPhlAn workflow commandsmerge_metaphlan_tables.py,metaphlan2krona.py,sgb_to_gtdb_profile.py: profile conversion and merge utilitiesbowtie2,minimap2,samtools,blastn,Rscript: runtime dependencies used by MetaPhlAn and its helper scripts
The image stores package inventories under /opt/metaphlan/share/doc/metaphlan/.
MetaPhlAn 4.2 uses --db_dir for the local database directory. Older upstream
examples may mention --bowtie2db; for this packaged version, prefer
--db_dir.
Prepare a project-local database directory:
mkdir -p metaphlan-db
taf-metaphlan metaphlan --install --db_dir "$PWD/metaphlan-db"For reproducible production work, keep the database directory with the project
or use a controlled shared path, record the exact database index reported by
MetaPhlAn, and pass it with --index:
taf-metaphlan reads.fq.gz \
--input_type fastq \
--db_dir "$PWD/metaphlan-db" \
--index <database-index> \
--nproc 8 \
--mapout sample.mapout.bz2 \
-o sample.profile.tsvDatabase download requires network access at the time you run --install.
Normal profiling can run offline after the database is present.
Access default upstream help:
taf-metaphlan -- --helpProfile reads:
taf-metaphlan reads.fq.gz \
--input_type fastq \
--db_dir "$PWD/metaphlan-db" \
--index <database-index> \
-o profile.tsvProfile long reads:
taf-metaphlan longreads.fq.gz \
--input_type fastq \
--long_reads \
--db_dir "$PWD/metaphlan-db" \
--index <database-index> \
-o profile.tsvReuse a saved mapping file:
taf-metaphlan sample.mapout.bz2 \
--input_type mapout \
--db_dir "$PWD/metaphlan-db" \
--index <database-index> \
-o profile.tsvMerge profile tables:
taf-metaphlan merge_metaphlan_tables.py sample1.profile.tsv sample2.profile.tsv > merged.tsvRun an explicit packaged command:
taf-metaphlan strainphlan --help
taf-metaphlan sample2markers.py --helpTAFFISH tool apps support automatic command mode. For non-option leading
commands, taf-metaphlan merge_metaphlan_tables.py ... runs that executable
inside the same container.
Use taf-metaphlan -- --help, taf-metaphlan -- --version, or
taf-metaphlan metaphlan --install ... when the first upstream argument starts
with -.
| Input | Meaning | Notes |
|---|---|---|
| FASTQ/FASTA reads | metagenomic reads | use --input_type fastq, fasta, or another upstream-supported type |
| SAM/mapout | saved mapping output | MetaPhlAn 4.2 uses --input_type mapout for saved mapout files |
| database directory | MetaPhlAn marker database | pass with --db_dir; the database is external to the image |
The main output is a MetaPhlAn profile table with clades, taxonomic identifiers,
relative abundances, and version/database header metadata. Optional outputs
include saved mapping files (--mapout), BIOM-format profile output, Krona
inputs, merged abundance tables, and StrainPhlAn marker/consensus outputs.
MetaPhlAn production databases are large and versioned independently of this
wrapper. Keep them outside the image, mount or work from a writable project
directory, and pass the path with --db_dir. Real runs may need substantial
memory and disk space, especially during database installation or large
multi-sample analyses.
This package declares linux/amd64 only because the current Bioconda package
metadata does not advertise additional platforms for metaphlan. On Apple
Silicon or other arm64 hosts, Docker/Podman can run it through amd64 emulation.
The smoke test verifies the packaged runtime, key helper commands, Python/R and native dependency availability, a tiny FASTQ read-statistics path, table merging, and a controlled missing-database failure. It intentionally does not download the MetaPhlAn database or validate taxonomic accuracy on real metagenomes.
- If MetaPhlAn reports that a database is missing, run
taf-metaphlan metaphlan --install --db_dir <dir>or pass the correct existing--db_dirand--index. - If an old tutorial uses
--bowtie2db, translate that option to--db_dirfor MetaPhlAn 4.2. - If an old tutorial uses
--bowtie2out, translate the saved mapping workflow to--mapoutand--input_type mapout. - If a long-read run fails, confirm
--long_readsis present and inspectmetaphlan --helpfor current Minimap2-related thresholds.
The smoke test covers:
- wrapper metadata and help
- MetaPhlAn 4.2.4 runtime identity
- key helper commands and aligner/runtime dependencies
- a small offline functional path that does not need the production database
- an explicit missing-database error path
It does not replace full scientific validation on production datasets.
TAFFISH app packaging: Apache-2.0.
Upstream MetaPhlAn is MIT-licensed. Cite MetaPhlAn according to upstream guidance, including the MetaPhlAn 4 Nature Biotechnology paper and the StrainPhlAn/MetaPhlAn method papers when applicable: