Skip to content

Commit 70c4cd5

Browse files
committed
Merge branch 'master' of https://github.com/iqtree/iqtree2.wiki
2 parents f7bb660 + 4055dc9 commit 70c4cd5

3 files changed

Lines changed: 47 additions & 6 deletions

File tree

doc/Complex-Models.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,38 @@ Mixture models can be combined with rate heterogeneity, e.g.:
146146

147147
Here, we specify two mixture components and four Gamma rate categories. Effectively, this means that there are eight mixture components. Each site has a probability belonging to either `JC` or `HKY` and to one of the four rate categories.
148148

149+
### MixtureFinder
150+
151+
MixtureFinder is an approach to select the optimum number of classes and the substitution model in each class for a mixture model of Q matrices. To run MixtureFinder:
152+
153+
iqtree -s example.phy -m MF+MIX
154+
155+
Here, we estimate the optimal Q mixture model. To select mixture model and then do the tree search:
156+
157+
iqtree -s example.phy -m MFP+MIX
158+
159+
Likelihood ratio test (LRT) with p-value = 0.05 is the default method to assess the number of classes in the Q mixture model. To change the p-value:
160+
161+
iqtree -s example.phy -m MF+MIX -lrt 0.01
162+
163+
Here, we change the LRT p-value to 0.01. To use information criteria instead of LRT to assess the number of classes:
164+
165+
iqtree -s example.phy -m MF+MIX -lrt 0 -merit BIC
166+
167+
Here, `-lrt 0` means turning off the LRT, then `-merit BIC` means using BIC to assess the number of classes. (Note that: `-merit` also decides the creterion for selecting subtitution model type in each classes. If using LRT for assessing the number of classes, the default creterion for selecting subtitution model type is BIC.)
168+
169+
Options for ModelFinder also work for MixtureFinder, e.g.:
170+
171+
iqtree -s example.phy -m MF+MIX -mset HKY,GTR -mrate E,I,G,I+G
172+
173+
The `-mset HKY,GTR` means we select subtitution model type among only `HKY` and `GTR` substitution models in each iteration of adding one more class. The `-mrate E,I,G,I+G` means we select the rate heterogeneity across sites models among `+E`, `+I`, `G` and `+I+G` models.
174+
175+
Other options for MixtureFinder:
176+
| Model option | Description |
177+
| -------------- | ------------------------------------------------------------------------------------------------------------- |
178+
| `-qmax` | Maximum number of Q-mixture classes (default: 10). Specify a number after the option (e.g., `-qmax 5`). |
179+
| `-mrate-twice` | estimate the rate heterogeneity across sites models again after select the best Q-mixture model (default: on) |
180+
149181

150182
### Profile mixture models
151183

@@ -368,7 +400,7 @@ In the above command, all trees share the same GTR model, DNA frequencies and ga
368400
| `.treefile` | By using the MAST model, IQ-TREE will report multiple trees inside this file. Their topologies should match the input topologies in the newick file. |
369401
| `.iqtree` | All the estimated model parameters for each tree and the tree weights (i.e. proportions of the sites belonging to the tree and the model) are shown in this file. The order of the tree weights follows the order of the input topologies in the newick file. |
370402

371-
Please note that, in any MAST model with more than one substitution model (i.e. models 1 - 5 in the previous table), the weights can only be interpreted as the linked weight of the model and the tree. So the weights are not unique to the tree. In other words, IQ-TREE will report the weights pertaining only to the trees for the model 6 in the previous table.
403+
Please note that, in any MAST model with more than one substitution model (i.e. models 1 - 5 in the previous table), the weights can only be interpreted as the linked weight of the model and the tree. So the weights are not unique to the tree. In other words, IQ-TREE will report the weights pertaining only to the trees for the model 6 in the previous table.
372404

373405
[Brown et al. (2013)]: https://doi.org/10.1098/rspb.2013.1755
374406
[Lartillot and Philippe, 2004]: https://doi.org/10.1093/molbev/msh112

doc/Frequently-Asked-Questions.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,7 @@ How does IQ-TREE treat gap/missing/ambiguous characters?
8585
<div class="hline"></div>
8686

8787
Gaps (`-`) and missing characters (`?` or `N` for DNA alignments) are treated in the same way as `unknown` characters, which represent no information. The same treatment holds for many other ML software (e.g., RAxML, PhyML). More explicitly,
88-
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood
89-
of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
88+
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
9089

9190
Ambiguous characters that represent more than one character are also supported: each represented character will have equal likelihood. For DNA the following ambigous nucleotides are supported according to [IUPAC nomenclature](https://en.wikipedia.org/wiki/Nucleic_acid_notation):
9291

@@ -102,17 +101,20 @@ Ambiguous characters that represent more than one character are also supported:
102101
| H | A, C or T (next letter after G) |
103102
| D | A, G or T (next letter after C) |
104103
| V | A, G or C (next letter after T) |
105-
| ?, -, ., ~, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
104+
| ?, -, ., ~, !, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
106105

107-
For protein the following ambiguous amino-acids are supported:
106+
For protein sequences the following ambiguous amino-acids are supported:
108107

109108
| Amino-acid | Meaning |
110109
|------------|---------------------------------------------------------------|
111110
| B | N or D |
112111
| Z | Q or E |
113112
| J | I or L |
114113
| U | unknown AA (although it is the 21st AA) |
115-
| ?, -, ., ~, * or X | unknown AA (all 20 AAs are equally likely) |
114+
| ?, -, ., ~, *, ! or X | unknown AA (all 20 AAs are equally likely) |
115+
116+
The letters `*` and `!` may found in alignments of protein and/or coding DNA sequences.
117+
Stop codon is typically translated to `*`. Some alignment programs also mark frameshift mutations (cf. [Ranwez et al., 2011]), that means since frameshift mutations in a codon alignment cause incomplete codons that cannot be unambiguously translated the resulting position in the translated protein sequence and padding positions in the respective codon are marked using `!`.
116118

117119

118120
Can I mix DNA and protein data in a partitioned analysis?
@@ -290,4 +292,5 @@ C A
290292

291293
[Guindon et al., 2010]: https://doi.org/10.1093/sysbio/syq010
292294
[Minh et al., 2013]: https://doi.org/10.1093/molbev/mst024
295+
[Ranwez et al., 2011]: https://doi.org/10.1371/journal.pone.0022594
293296

doc/Tutorial.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@ This tiny alignment contains 7 DNA sequences from several animals with the seque
8181
>**TIP**: From version 2 you can input a directory of alignment files. IQ-TREE 2 will load and concatenate all alignments within the directory, eliminating the need for users to manually perform this step.
8282
{: .tip}
8383

84+
Not all special characters are allowed in sequence names, because they may interfere with the structure encoding in the Newick tree files. To avoid problems with downstream software (like tree viewers), IQ-Tree (and also other phylogenetic software) checks the names for such potentially interfering characters and substitutes them by underscores `_`.
85+
Permitted characters in sequence names are alphanumeric letters, underscores `_`, dash `-`, dot `.`, slash `\` and vertical bar `|`. All other characters are substituted, like e.g. `hawk's-eye` is converted to `hawk_s-eye` as which it will appear in the tree.
86+
87+
Please note, this can lead to duplicate names if you, for instance, already have two sequences named `hawk_s-eye` and `hawk's-eye`. In such cases you will obtain an error and you need to adjust the names in the original input alignment.
88+
89+
8490
First running example
8591
---------------------
8692
<div class="hline"></div>

0 commit comments

Comments
 (0)