You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/Complex-Models.md
+33-1Lines changed: 33 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -146,6 +146,38 @@ Mixture models can be combined with rate heterogeneity, e.g.:
146
146
147
147
Here, we specify two mixture components and four Gamma rate categories. Effectively, this means that there are eight mixture components. Each site has a probability belonging to either `JC` or `HKY` and to one of the four rate categories.
148
148
149
+
### MixtureFinder
150
+
151
+
MixtureFinder is an approach to select the optimum number of classes and the substitution model in each class for a mixture model of Q matrices. To run MixtureFinder:
152
+
153
+
iqtree -s example.phy -m MF+MIX
154
+
155
+
Here, we estimate the optimal Q mixture model. To select mixture model and then do the tree search:
156
+
157
+
iqtree -s example.phy -m MFP+MIX
158
+
159
+
Likelihood ratio test (LRT) with p-value = 0.05 is the default method to assess the number of classes in the Q mixture model. To change the p-value:
160
+
161
+
iqtree -s example.phy -m MF+MIX -lrt 0.01
162
+
163
+
Here, we change the LRT p-value to 0.01. To use information criteria instead of LRT to assess the number of classes:
164
+
165
+
iqtree -s example.phy -m MF+MIX -lrt 0 -merit BIC
166
+
167
+
Here, `-lrt 0` means turning off the LRT, then `-merit BIC` means using BIC to assess the number of classes. (Note that: `-merit` also decides the creterion for selecting subtitution model type in each classes. If using LRT for assessing the number of classes, the default creterion for selecting subtitution model type is BIC.)
168
+
169
+
Options for ModelFinder also work for MixtureFinder, e.g.:
The `-mset HKY,GTR` means we select subtitution model type among only `HKY` and `GTR` substitution models in each iteration of adding one more class. The `-mrate E,I,G,I+G` means we select the rate heterogeneity across sites models among `+E`, `+I`, `G` and `+I+G` models.
|`-qmax`| Maximum number of Q-mixture classes (default: 10). Specify a number after the option (e.g., `-qmax 5`). |
179
+
|`-mrate-twice`| estimate the rate heterogeneity across sites models again after select the best Q-mixture model (default: on) |
180
+
149
181
150
182
### Profile mixture models
151
183
@@ -368,7 +400,7 @@ In the above command, all trees share the same GTR model, DNA frequencies and ga
368
400
|`.treefile`| By using the MAST model, IQ-TREE will report multiple trees inside this file. Their topologies should match the input topologies in the newick file. |
369
401
|`.iqtree`| All the estimated model parameters for each tree and the tree weights (i.e. proportions of the sites belonging to the tree and the model) are shown in this file. The order of the tree weights follows the order of the input topologies in the newick file. |
370
402
371
-
Please note that, in any MAST model with more than one substitution model (i.e. models 1 - 5 in the previous table), the weights can only be interpreted as the linked weight of the model and the tree. So the weights are not unique to the tree. In other words, IQ-TREE will report the weights pertaining only to the trees for the model 6 in the previous table.
403
+
Please note that, in any MAST model with more than one substitution model (i.e. models 1 - 5 in the previous table), the weights can only be interpreted as the linked weight of the model and the tree. So the weights are not unique to the tree. In other words, IQ-TREE will report the weights pertaining only to the trees for the model 6 in the previous table.
372
404
373
405
[Brown et al. (2013)]: https://doi.org/10.1098/rspb.2013.1755
374
406
[Lartillot and Philippe, 2004]: https://doi.org/10.1093/molbev/msh112
Copy file name to clipboardExpand all lines: doc/Frequently-Asked-Questions.md
+8-5Lines changed: 8 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,8 +85,7 @@ How does IQ-TREE treat gap/missing/ambiguous characters?
85
85
<divclass="hline"></div>
86
86
87
87
Gaps (`-`) and missing characters (`?` or `N` for DNA alignments) are treated in the same way as `unknown` characters, which represent no information. The same treatment holds for many other ML software (e.g., RAxML, PhyML). More explicitly,
88
-
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood
89
-
of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
88
+
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
90
89
91
90
Ambiguous characters that represent more than one character are also supported: each represented character will have equal likelihood. For DNA the following ambigous nucleotides are supported according to [IUPAC nomenclature](https://en.wikipedia.org/wiki/Nucleic_acid_notation):
92
91
@@ -102,17 +101,20 @@ Ambiguous characters that represent more than one character are also supported:
102
101
| H | A, C or T (next letter after G) |
103
102
| D | A, G or T (next letter after C) |
104
103
| V | A, G or C (next letter after T) |
105
-
| ?, -, ., ~, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
104
+
| ?, -, ., ~, !, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
106
105
107
-
For protein the following ambiguous amino-acids are supported:
106
+
For protein sequences the following ambiguous amino-acids are supported:
| ?, -, ., ~, * or X | unknown AA (all 20 AAs are equally likely) |
114
+
| ?, -, ., ~, *, ! or X | unknown AA (all 20 AAs are equally likely) |
115
+
116
+
The letters `*` and `!` may found in alignments of protein and/or coding DNA sequences.
117
+
Stop codon is typically translated to `*`. Some alignment programs also mark frameshift mutations (cf. [Ranwez et al., 2011]), that means since frameshift mutations in a codon alignment cause incomplete codons that cannot be unambiguously translated the resulting position in the translated protein sequence and padding positions in the respective codon are marked using `!`.
116
118
117
119
118
120
Can I mix DNA and protein data in a partitioned analysis?
@@ -290,4 +292,5 @@ C A
290
292
291
293
[Guindon et al., 2010]: https://doi.org/10.1093/sysbio/syq010
292
294
[Minh et al., 2013]: https://doi.org/10.1093/molbev/mst024
295
+
[Ranwez et al., 2011]: https://doi.org/10.1371/journal.pone.0022594
Copy file name to clipboardExpand all lines: doc/Tutorial.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,6 +81,12 @@ This tiny alignment contains 7 DNA sequences from several animals with the seque
81
81
>**TIP**: From version 2 you can input a directory of alignment files. IQ-TREE 2 will load and concatenate all alignments within the directory, eliminating the need for users to manually perform this step.
82
82
{: .tip}
83
83
84
+
Not all special characters are allowed in sequence names, because they may interfere with the structure encoding in the Newick tree files. To avoid problems with downstream software (like tree viewers), IQ-Tree (and also other phylogenetic software) checks the names for such potentially interfering characters and substitutes them by underscores `_`.
85
+
Permitted characters in sequence names are alphanumeric letters, underscores `_`, dash `-`, dot `.`, slash `\` and vertical bar `|`. All other characters are substituted, like e.g. `hawk's-eye` is converted to `hawk_s-eye` as which it will appear in the tree.
86
+
87
+
Please note, this can lead to duplicate names if you, for instance, already have two sequences named `hawk_s-eye` and `hawk's-eye`. In such cases you will obtain an error and you need to adjust the names in the original input alignment.
0 commit comments