Merge branch 'master' of https://github.com/iqtree/iqtree2.wiki

trongnhanuit · trongnhanuit · commit 70c4cd5996fe · 2023-12-01T21:10:18.000+11:00
diff --git a/doc/Complex-Models.md b/doc/Complex-Models.md
@@ -146,6 +146,38 @@ Mixture models can be combined with rate heterogeneity, e.g.:
 
 Here, we specify two mixture components and four Gamma rate categories. Effectively, this means that there are eight mixture components. Each site has a probability belonging to either `JC` or `HKY` and to one of the four rate categories.
 
+### MixtureFinder
+
+MixtureFinder is an approach to select the optimum number of classes and the substitution model in each class for a mixture model of Q matrices. To run MixtureFinder:
+
+	iqtree -s example.phy -m MF+MIX
+	
+Here, we estimate the optimal Q mixture model. To select mixture model and then do the tree search:
+
+	iqtree -s example.phy -m MFP+MIX
+	
+Likelihood ratio test (LRT) with p-value = 0.05 is the default method to assess the number of classes in the Q mixture model. To change the p-value:
+
+	iqtree -s example.phy -m MF+MIX -lrt 0.01
+	
+Here, we change the LRT p-value to 0.01. To use information criteria instead of LRT to assess the number of classes:
+
+	iqtree -s example.phy -m MF+MIX -lrt 0 -merit BIC
+	
+Here, `-lrt 0` means turning off the LRT, then `-merit BIC` means using BIC to assess the number of classes. (Note that: `-merit` also decides the creterion for selecting subtitution model type in each classes. If using LRT for assessing the number of classes, the default creterion for selecting subtitution model type is BIC.)
+
+Options for ModelFinder also work for MixtureFinder, e.g.:
+
+	iqtree -s example.phy -m MF+MIX -mset HKY,GTR -mrate E,I,G,I+G
+	
+The `-mset HKY,GTR` means we select subtitution model type among only `HKY` and `GTR` substitution models in each iteration of adding one more class. The `-mrate E,I,G,I+G` means we select the rate heterogeneity across sites models among `+E`, `+I`, `G` and `+I+G` models.
+
+Other options for MixtureFinder:
+| Model option   | Description                                                                                                   |
+| -------------- | ------------------------------------------------------------------------------------------------------------- |
+| `-qmax`        | Maximum number of Q-mixture classes (default: 10). Specify a number after the option (e.g., `-qmax 5`).       |
+| `-mrate-twice` | estimate the rate heterogeneity across sites models again after select the best Q-mixture model (default: on) |
+
 
 ### Profile mixture models
 
@@ -368,7 +400,7 @@ In the above command, all trees share the same GTR model, DNA frequencies and ga
 | `.treefile` | By using the MAST model, IQ-TREE will report multiple trees inside this file. Their topologies should match the input topologies in the newick file. |
 | `.iqtree` | All the estimated model parameters for each tree and the tree weights (i.e. proportions of the sites belonging to the tree and the model) are shown in this file. The order of the tree weights follows the order of the input topologies in the newick file. |
 
-Please note that, in any MAST model with more than one substitution model (i.e. models 1 - 5 in the previous table), the weights can only be interpreted as the linked weight of the model and the tree. So the weights are not unique to the tree. In other words, IQ-TREE will report the weights pertaining only to the trees for the model 6 in the previous table.
+Please note that, in any MAST model with more than one substitution model (i.e. models 1 - 5 in the previous table), the weights can only be interpreted as the linked weight of the model and the tree. So the weights are not unique to the tree. In other words, IQ-TREE will report the weights pertaining only to the trees for the model 6 in the previous table. 
 
 [Brown et al. (2013)]: https://doi.org/10.1098/rspb.2013.1755
 [Lartillot and Philippe, 2004]: https://doi.org/10.1093/molbev/msh112
diff --git a/doc/Frequently-Asked-Questions.md b/doc/Frequently-Asked-Questions.md
@@ -85,8 +85,7 @@ How does IQ-TREE treat gap/missing/ambiguous characters?
 <div class="hline"></div>
 
 Gaps (`-`) and missing characters (`?` or `N` for DNA alignments) are treated in the same way as `unknown` characters, which represent no information. The same treatment holds for many other ML software (e.g., RAxML, PhyML). More explicitly,
-for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood
-of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
+for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
 
 Ambiguous characters that represent more than one character are also supported: each represented character will have equal likelihood. For DNA the following ambigous nucleotides are supported according to [IUPAC nomenclature](https://en.wikipedia.org/wiki/Nucleic_acid_notation):
 
@@ -102,17 +101,20 @@ Ambiguous characters that represent more than one character are also supported:
 | H    | A, C or T (next letter after G) |
 | D    | A, G or T (next letter after C) |
 | V    | A, G or C (next letter after T) |
-| ?, -, ., ~, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
+| ?, -, ., ~, !, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
 
-For protein the following ambiguous amino-acids are supported:
+For protein sequences the following ambiguous amino-acids are supported:
 
 | Amino-acid | Meaning |
 |------------|---------------------------------------------------------------|
 | B          | N or D |
 | Z          | Q or E |
 | J          | I or L |
 | U          | unknown AA (although it is the 21st AA) |
-| ?, -, ., ~, * or X | unknown AA (all 20 AAs are equally likely) |
+| ?, -, ., ~, *, ! or X | unknown AA (all 20 AAs are equally likely) |
+
+The letters `*` and `!` may found in alignments of protein and/or coding DNA sequences. 
+Stop codon is typically translated to `*`. Some alignment programs also mark frameshift mutations (cf. [Ranwez et al., 2011]), that means since frameshift mutations in a codon alignment cause incomplete codons that cannot be unambiguously translated the resulting position in the translated protein sequence and padding positions in the respective codon are marked using `!`. 
 
 
 Can I mix DNA and protein data in a partitioned analysis?
@@ -290,4 +292,5 @@ C A
 
 [Guindon et al., 2010]: https://doi.org/10.1093/sysbio/syq010
 [Minh et al., 2013]: https://doi.org/10.1093/molbev/mst024
+[Ranwez et al., 2011]: https://doi.org/10.1371/journal.pone.0022594
 
diff --git a/doc/Tutorial.md b/doc/Tutorial.md
@@ -81,6 +81,12 @@ This tiny alignment contains 7 DNA sequences from several animals with the seque
 >**TIP**: From version 2 you can input a directory of alignment files. IQ-TREE 2 will load and concatenate all alignments within the directory, eliminating the need for users to manually perform this step.
 {: .tip}
 
+Not all special characters are allowed in sequence names, because they may interfere with the structure encoding in the Newick tree files. To avoid problems with downstream software (like tree viewers), IQ-Tree (and also other phylogenetic software) checks the names for such potentially interfering characters and substitutes them by underscores `_`. 
+Permitted characters in sequence names are alphanumeric letters, underscores `_`, dash `-`, dot `.`, slash `\` and vertical bar `|`. All other characters are substituted, like e.g. `hawk's-eye` is converted to `hawk_s-eye` as which it will appear in the tree.
+
+Please note, this can lead to duplicate names if you, for instance, already have two sequences named `hawk_s-eye` and `hawk's-eye`. In such cases you will obtain an error and you need to adjust the names in the original input alignment.
+
+
 First running example
 ---------------------
 <div class="hline"></div>