Skip to content

Commit acd046b

Browse files
committed
Support predefined mutations in AliSim
1 parent df07bbe commit acd046b

1 file changed

Lines changed: 28 additions & 1 deletion

File tree

doc/AliSim.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -552,6 +552,31 @@ AliSim supports the [FunDi model](https://doi.org/10.1093/bioinformatics/btr470)
552552

553553
This example simulates a new alignment under the Juke-Cantor model from the input tree `tree.nwk` with the default sequence length of 1,000 sites. Since the user specifies FunDi model with `<RHO>` = 0.1, thus, in the sequences of Taxon A, and C, 100 random sites (sequence length * `<RHO>` = 1,000 * 0.1) are permuted with each other.
554554

555+
Pre-define mutations
556+
----------------------------
557+
AliSim allows users to pre-define mutations that occur at some specific branches along the tree. To do so, one needs to: (1) specify an ancestral sequence at the root of the tree by adding `--root-seq <ALN_FILE>,<SEQ_NAME>` to the execution command; then (2) specify those mutations in the input tree file.
558+
559+
Assuming that we have an alignment named `root_aln.phy`, which contains the ancestral sequence `S1` as in the following. (Note that `S2` and `S3` are not mandatorily presented).
560+
561+
3 40
562+
S1 GTTTACTGGCAGATTTTCATAGATGATGTAAGATCAGACA
563+
S2 GTTTACAGGCATATTTTCATAGATGATGTAAGTTCAGACA
564+
S3 GTTTACTGGCAGATTTTCATTGATGATGTAAGATCAGACA
565+
566+
One can specify a list of pre-defined mutations that occur at each branch using `[&mutations={<list_of_mutations>}]` in the tree file. Mutations in the list are separated by a forward slash `/` as in the following tree file `tree_mutations.nwk`.
567+
568+
(T1:0.2,(T2[&mutations={C25A/A5G}]:0.3,T4:0.1)I1[&mutations={C39G/T17A/G25C}]:0.4,T3:0.1);
569+
570+
In the above tree, we specify:
571+
572+
* Three mutations `C39G` (i.e., C is substituted by G at site 39), `T17A`, and `G25C` occur along the branch connecting the root node and the internal node `I1`;
573+
* Two mutations `C25A` and `A5G` occur along the branch connecting the internal node `I1` and taxon `T2`.
574+
575+
The following command
576+
577+
iqtree2 --alisim example_mutations --root-seq root_aln.phy,S1 -t tree_mutations.nwk -m JC
578+
579+
will simulate an alignment with 4 sequences (each with 40 sites) under the [Jukes-Cantor model](http://doi.org/10.1016/B978-1-4832-3211-9.50009-7) where sites 5, 17, 25, and 39 are substituted according to the above pre-defined mutations.
555580

556581
Parallel sequence simulations
557582
----------------------------
@@ -564,7 +589,7 @@ This example simulates a new alignment under the Juke-Cantor model from the inpu
564589
**NOTES**:
565590

566591
- The performance of AliSim-OpenMP-IM is affected by a memory limit factor (=0.2 (by default) and can be set in the range (0 to 1]): a small factor will potentially increase the runtime; a large factor will increase the memory consumption. To specify this memory limit factor, one can use `--mem-limit <FACTOR>` option.
567-
- In AliSim-OpenMP-EM algorithm, the simulated sequences will be written in an arbitrary order to the alignment (which is not a matter in most phylogenetic software). However, if users want to maintain the sequence order (based on the preorder traversal of the tree), they can use `--keep-seq-order` option, but it will sacrifice a certain runtime.
592+
- If using AliSim-OpenMP-EM algorithm, the simulated sequences will be written in an arbitrary order to the alignment (which is not a matter in most phylogenetic software). However, if users want to maintain the sequence order (based on the preorder traversal of the tree), they can use `--keep-seq-order` option, but it will sacrifice a certain runtime.
568593
- If using AliSim-OpenMP-EM algorithm, one can use `--no-merge` to skip the concatenation step to save the runtime. Note that, when simulating an alignment of length L with K threads, AliSim will output the alignment as K sub-alignment files of L/K sites.
569594

570595
To simulate many alignments, one can use the MPI version of AliSim:
@@ -579,6 +604,8 @@ To simulate many large alignments, users can employ both MPI and OpenMP on a hig
579604

580605
This example uses 10 MPI processes, each having 4 threads (i.e. a total of 40 threads will be run) to simulate 100 large alignments under the Juke-Cantor model from the input tree `tree.nwk` with the sequence length of 1,000,000 sites.
581606

607+
**NOTES**: Our MPI implementation supports Indels as the original version of AliSim, while the OpenMP algorithm does not. Therefore, one can employ only MPI to simulate many alignments with Indels.
608+
582609
Command reference
583610
-----------------
584611

0 commit comments

Comments
 (0)