Skip to content

taffish/mlst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlst

mlst packages Torsten Seemann's mlst for TAFFISH. It scans assembled contig/genome files against traditional PubMLST typing schemes and reports the best matching MLST scheme, sequence type, and alleles.

Package identity:

  • name: mlst
  • command: taf-mlst
  • kind: tool
  • version: 2.35.0-r1
  • container image: ghcr.io/taffish/mlst:2.35.0-r1
  • TAFFISH app license: Apache-2.0
  • upstream: tseemann/mlst, tag/release v2.35.0
  • runtime version: mlst 2.35.0

What This App Packages

This app provides the upstream mlst command plus its normal runtime dependencies in a TAFFISH container:

  • mlst: scan FASTA, GenBank, or EMBL assemblies against PubMLST scheme data.
  • mlst-make_blast_db: rebuild BLAST indices after adding or replacing scheme data.
  • blastn and makeblastdb: BLAST+ commands used by mlst.
  • any2fasta, Perl, gzip, bzip2, and unzip: input conversion and compressed input support used by upstream workflows.

The image copies the upstream/Bioconda mlst=2.35.0 scripts, MLST Perl modules, and PubMLST-derived database snapshot from a builder stage, then installs only the runtime pieces needed by mlst in the final image. BLAST is pinned to 2.16.0 for multi-architecture Bioconda compatibility.

Usage

Default upstream command:

taf-mlst -- --full contigs.fa
taf-mlst -- --scheme neisseria --full --csv contigs.fa
taf-mlst -- --fofn assemblies.txt --full --outfile mlst.tsv

Access upstream help and version:

taf-mlst -- --help
taf-mlst -- --version

Run an explicit packaged command inside the same TAFFISH app environment:

taf-mlst mlst contigs.fa
taf-mlst mlst --info
taf-mlst blastn -version
taf-mlst mlst-make_blast_db

taf-mlst --help shows this TAFFISH app manual. Use taf-mlst -- --help when you want the upstream mlst help. Because TAFFISH automatic command mode treats a non-option first argument as a container command, use taf-mlst mlst contigs.fa when the first upstream argument is a filename.

Inputs

mlst accepts one or more assembly/genome files in FASTA, GenBank, or EMBL format. Upstream also supports compressed input such as .gz, .bz2, and .zip through packaged helper tools.

For many files, use upstream --fofn:

printf '%s\n' sample1.fa sample2.fa > assemblies.txt
taf-mlst -- --fofn assemblies.txt --full --outfile mlst.tsv

Output Notes

Default output is tab-separated and has no header. It reports the input file, scheme, ST, and allele calls. Upstream recommends --full for a stable header:

Column Meaning
FILE input file
SCHEME selected or forced MLST scheme
ST sequence type, or - when no exact ST is assigned
STATUS upstream quality/status code such as PERFECT or NOVEL
SCORE upstream match score
ALLELES semicolon-separated allele calls

Use --csv for comma-separated output and --outfile FILE to write output to a file instead of stdout.

Databases

This image includes the database snapshot bundled with upstream/Bioconda mlst 2.35.0, exposed at:

/opt/mlst/db

The image sets:

MLST_DBDIR=/opt/mlst/db

mlst 2.35.0 supports MLST_DBDIR; it changes the default BLAST database prefix to MLST_DBDIR/blast/mlst.fa and the default scheme directory to MLST_DBDIR/pubmlst. Explicit upstream options --blastdb and --datadir override those defaults.

Upstream no longer provides a simple built-in PubMLST update script. Current PubMLST access for fresh scheme data may require a user account, API access, and the separate mlstdb workflow described by upstream. This TAFFISH app does not download or update PubMLST data at runtime.

For a custom database already prepared on the host, either pass explicit paths inside the mounted working directory:

taf-mlst -- --datadir ./mlst-db/pubmlst --blastdb ./mlst-db/blast/mlst.fa contigs.fa

or mount a fixed database root and set MLST_DBDIR through your backend run arguments:

TAFFISH_DOCKER_RUN_ARGS='-v /host/mlst-db:/db:ro -e MLST_DBDIR=/db' \
  TAFFISH_CONTAINER_BACKEND=docker \
  taf-mlst -- --full contigs.fa

After adding a private scheme under pubmlst/SCHEME, run mlst-make_blast_db inside the same database root before using it.

Boundaries

This app exposes upstream mlst and the bundled database snapshot. It does not provide a PubMLST account, API key, live database update service, downstream epidemiology interpretation, or species identification outside what mlst infers from the available schemes.

The included smoke fixture is synthetic and checks container/runtime integrity. It is not a validation of every PubMLST scheme or production typing result.

Platform Notes

The package is built for linux/amd64 and linux/arm64. The Bioconda mlst package is noarch, while BLAST+ and runtime dependencies are installed for the target platform during image build.

Testing

The smoke test covers:

  • command presence for mlst, mlst-make_blast_db, BLAST+, any2fasta, Perl, and compression helpers
  • exact runtime version mlst 2.35.0
  • upstream help and mlst --info against the bundled database
  • dependency checks for BLAST 2.16.0, any2fasta 0.8.1, and required Perl modules
  • a synthetic positive MLST call generated from the bundled PubMLST snapshot
  • compressed FASTA input plus --full --csv --outfile

It does not download PubMLST data or run exhaustive scheme-level validation.

License and Citation

TAFFISH app packaging is Apache-2.0.

Upstream mlst is GPL-2.0-only. Bundled PubMLST-derived data retain their source citation and access-policy requirements. Upstream asks users to cite PubMLST and the mlst GitHub project. For PubMLST, cite Jolley et al., Wellcome Open Research 2018;3:124, DOI 10.12688/wellcomeopenres.14826.1, PMID 30345391.

Upstream project: https://github.com/tseemann/mlst

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors