mlst packages Torsten Seemann's mlst for TAFFISH. It scans assembled
contig/genome files against traditional PubMLST typing schemes and reports the
best matching MLST scheme, sequence type, and alleles.
Package identity:
- name:
mlst - command:
taf-mlst - kind:
tool - version:
2.35.0-r1 - container image:
ghcr.io/taffish/mlst:2.35.0-r1 - TAFFISH app license: Apache-2.0
- upstream:
tseemann/mlst, tag/releasev2.35.0 - runtime version:
mlst 2.35.0
This app provides the upstream mlst command plus its normal runtime
dependencies in a TAFFISH container:
mlst: scan FASTA, GenBank, or EMBL assemblies against PubMLST scheme data.mlst-make_blast_db: rebuild BLAST indices after adding or replacing scheme data.blastnandmakeblastdb: BLAST+ commands used bymlst.any2fasta, Perl, gzip, bzip2, and unzip: input conversion and compressed input support used by upstream workflows.
The image copies the upstream/Bioconda mlst=2.35.0 scripts, MLST Perl modules,
and PubMLST-derived database snapshot from a builder stage, then installs only
the runtime pieces needed by mlst in the final image. BLAST is pinned to
2.16.0 for multi-architecture Bioconda compatibility.
Default upstream command:
taf-mlst -- --full contigs.fa
taf-mlst -- --scheme neisseria --full --csv contigs.fa
taf-mlst -- --fofn assemblies.txt --full --outfile mlst.tsvAccess upstream help and version:
taf-mlst -- --help
taf-mlst -- --versionRun an explicit packaged command inside the same TAFFISH app environment:
taf-mlst mlst contigs.fa
taf-mlst mlst --info
taf-mlst blastn -version
taf-mlst mlst-make_blast_dbtaf-mlst --help shows this TAFFISH app manual. Use taf-mlst -- --help when
you want the upstream mlst help. Because TAFFISH automatic command mode treats
a non-option first argument as a container command, use taf-mlst mlst contigs.fa
when the first upstream argument is a filename.
mlst accepts one or more assembly/genome files in FASTA, GenBank, or EMBL
format. Upstream also supports compressed input such as .gz, .bz2, and
.zip through packaged helper tools.
For many files, use upstream --fofn:
printf '%s\n' sample1.fa sample2.fa > assemblies.txt
taf-mlst -- --fofn assemblies.txt --full --outfile mlst.tsvDefault output is tab-separated and has no header. It reports the input file,
scheme, ST, and allele calls. Upstream recommends --full for a stable header:
| Column | Meaning |
|---|---|
FILE |
input file |
SCHEME |
selected or forced MLST scheme |
ST |
sequence type, or - when no exact ST is assigned |
STATUS |
upstream quality/status code such as PERFECT or NOVEL |
SCORE |
upstream match score |
ALLELES |
semicolon-separated allele calls |
Use --csv for comma-separated output and --outfile FILE to write output to a
file instead of stdout.
This image includes the database snapshot bundled with upstream/Bioconda mlst
2.35.0, exposed at:
/opt/mlst/db
The image sets:
MLST_DBDIR=/opt/mlst/db
mlst 2.35.0 supports MLST_DBDIR; it changes the default BLAST database
prefix to MLST_DBDIR/blast/mlst.fa and the default scheme directory to
MLST_DBDIR/pubmlst. Explicit upstream options --blastdb and --datadir
override those defaults.
Upstream no longer provides a simple built-in PubMLST update script. Current
PubMLST access for fresh scheme data may require a user account, API access, and
the separate mlstdb workflow described by upstream. This TAFFISH app does not
download or update PubMLST data at runtime.
For a custom database already prepared on the host, either pass explicit paths inside the mounted working directory:
taf-mlst -- --datadir ./mlst-db/pubmlst --blastdb ./mlst-db/blast/mlst.fa contigs.faor mount a fixed database root and set MLST_DBDIR through your backend run
arguments:
TAFFISH_DOCKER_RUN_ARGS='-v /host/mlst-db:/db:ro -e MLST_DBDIR=/db' \
TAFFISH_CONTAINER_BACKEND=docker \
taf-mlst -- --full contigs.faAfter adding a private scheme under pubmlst/SCHEME, run
mlst-make_blast_db inside the same database root before using it.
This app exposes upstream mlst and the bundled database snapshot. It does not
provide a PubMLST account, API key, live database update service, downstream
epidemiology interpretation, or species identification outside what mlst
infers from the available schemes.
The included smoke fixture is synthetic and checks container/runtime integrity. It is not a validation of every PubMLST scheme or production typing result.
The package is built for linux/amd64 and linux/arm64. The Bioconda mlst
package is noarch, while BLAST+ and runtime dependencies are installed for the
target platform during image build.
The smoke test covers:
- command presence for
mlst,mlst-make_blast_db, BLAST+,any2fasta, Perl, and compression helpers - exact runtime version
mlst 2.35.0 - upstream help and
mlst --infoagainst the bundled database - dependency checks for BLAST
2.16.0,any2fasta 0.8.1, and required Perl modules - a synthetic positive MLST call generated from the bundled PubMLST snapshot
- compressed FASTA input plus
--full --csv --outfile
It does not download PubMLST data or run exhaustive scheme-level validation.
TAFFISH app packaging is Apache-2.0.
Upstream mlst is GPL-2.0-only. Bundled PubMLST-derived data retain their
source citation and access-policy requirements. Upstream asks users to cite
PubMLST and the mlst GitHub project. For PubMLST, cite Jolley et al.,
Wellcome Open Research 2018;3:124, DOI
10.12688/wellcomeopenres.14826.1,
PMID 30345391.
Upstream project: https://github.com/tseemann/mlst