|
2 | 2 | layout: userdoc |
3 | 3 | title: "Substitution Models" |
4 | 4 | author: Hector Banos, Cuong Cao Dang, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Nhan Ly-Trong, Hiroaki Sato |
5 | | -date: 2024-05-27 |
| 5 | +date: 2024-05-30 |
6 | 6 | docid: 10 |
7 | 7 | icon: book |
8 | 8 | doctype: manual |
@@ -360,15 +360,26 @@ Binary and morphological models |
360 | 360 |
|
361 | 361 | The binary alignments should contain state `0` and `1`, whereas for morphological data, the valid states are `0` to `9` and `A` to `V`. |
362 | 362 |
|
363 | | -| Model | Explanation | |
364 | | -|---------|------------------------------------------------------------------------| |
365 | | -| JC2 | Jukes-Cantor type model for binary data.| |
366 | | -| GTR2 | General time reversible model for binary data.| |
367 | | -| MK | Jukes-Cantor type model for morphological data.| |
368 | | -| ORDERED | Allowing exchange of neighboring states only.| |
| 363 | +| Model | Explanation | |
| 364 | +|------------|------------------------------------------------------------------------| |
| 365 | +| JC2 | Jukes-Cantor type model for binary data.| |
| 366 | +| GTR2 | General time reversible model for binary data.| |
| 367 | +| MK | Jukes-Cantor type model for morphological data with equal rates.| |
| 368 | +| GTRX (GTR) | General time reversible model for morphological (or rather, multistate; **see the warning below**) data with unequal rates.| |
| 369 | +| ORDERED | Allowing exchange of neighboring states only.| |
| 370 | + |
| 371 | +Except for `GTR2` that has unequal state frequencies, all other models have equal state frequencies. Users can change how state frequencies are modeled in morphological models by appending `+FQ`, `+F`, `+F{...}`, or `+FO`. |
369 | 372 |
|
370 | | -Except for `GTR2` that has unequal state frequencies, all other models have equal state frequencies. |
| 373 | +> **WARNING**: Models with unequal rates and/or frequencies (e.g., `GTR2+FO`, `MK+FO`, `GTRX+FQ`, `GTRX+FO`) should not be applied to general morphological characters (transformational morphological characters; for the term, see [Sereno, 2007]) as their state labels are fundamentally arbitrary. These models are for data with non-arbitrary state labels (e.g., recoded amino acids [for practical application, see [Najle et al., 2023]; [xgrau/recoded-mixture-models]] and certain types of genomic information). For morphological data, it is the common practice to apply the `MK+FQ+ASC` model (or for ordered [additive] characters `ORDERED+FQ+ASC`) (for `+ASC`, see below) with or without rate heterogeneity across characters parameters. |
371 | 374 |
|
| 375 | +> **WARNING**: If you use `GTRX` for your multistate data, because of its sometimes very great number of free parameters, please make sure your data are sufficiently large and always test for model fit. |
| 376 | +
|
| 377 | + |
| 378 | +> **TIP**: Recent studies have indicated that applying a single morphological model to morphological data with heterogeneity of state space among characters may not be appropriate ([Khakurel et al., 2024]; [Mulvey et al., 2025]; [Huang, 2025 preprint]), and users may need to partition data by the number of states in each character before analyzing them in IQ-TREE. For information on how to analyze partitioned morphological data in IQ-TREE and some caveats about it, please refer to [davidcerny/GEOS26100-Fall2022], https://davidcerny.github.io/post/teaching_revbayes/, [Černý & Simonoff (2023)], and [ej91016/MorphoParse]. |
| 379 | +{: .tip} |
| 380 | + |
| 381 | +> **TIP**: For binary morphological characters where `0`s represent ancestral conditions and `1`s represent derived conditions, mainly neomorphic (`absent`/`present`) morphological characters (for the term, see [Sereno, 2007]), allowing asymmetrical frequencies in models would make sense (see e.g. [Pyron, 2017]; [Sun et al., 2018]; https://ms609.github.io/hyoliths/bayesian.html). This can be achieved in IQ-TREE, for example, by using the `GTR2` model. |
| 382 | +{: .tip} |
372 | 383 |
|
373 | 384 | >**TIP**: If morphological alignments do not contain constant sites (typically the case), then [an ascertainment bias correction model (`+ASC`)](#ascertainment-bias-correction) should be applied to correct the branch lengths for the absence of constant sites. |
374 | 385 | {: .tip} |
@@ -462,5 +473,15 @@ Users can fix the parameters of the model. For example, `+I{0.2}` will fix the p |
462 | 473 | [Yang, 1995]: http://www.genetics.org/content/139/2/993.abstract |
463 | 474 | [Yang et al., 1998]: http://mbe.oxfordjournals.org/content/15/12/1600.abstract |
464 | 475 | [Zharkikh, 1994]: https://doi.org/10.1007/BF00160155 |
465 | | - |
| 476 | +[Sereno, 2007]: https://doi.org/10.1111/j.1096-0031.2007.00161.x |
| 477 | +[Pyron, 2017]: https://doi.org/10.1093/sysbio/syw068 |
| 478 | +[Sun et al., 2018]: https://doi.org/10.1098/rspb.2018.1780 |
| 479 | +[xgrau/recoded-mixture-models]: https://github.com/xgrau/recoded-mixture-models |
| 480 | +[Najle et al., 2023]: https://doi.org/10.1016/j.cell.2023.08.027 |
| 481 | +[Khakurel et al., 2024]: https://doi.org/10.1093/sysbio/syae033 |
| 482 | +[Mulvey et al., 2025]: https://doi.org/10.1093/sysbio/syae055 |
| 483 | +[Huang, 2025 preprint]: https://doi.org/10.1101/2025.04.22.650124 |
| 484 | +[ej91016/MorphoParse]: https://github.com/ej91016/MorphoParse |
| 485 | +[davidcerny/GEOS26100-Fall2022]: https://github.com/davidcerny/GEOS26100-Fall2022 |
| 486 | +[Černý & Simonoff (2023)]: https://doi.org/10.1038/s41598-023-35784-3 |
466 | 487 |
|
0 commit comments