|
| 1 | +--- |
| 2 | +layout: userdoc |
| 3 | +title: "Assessing Phylogenetic Assumptions" |
| 4 | +author: M Bui |
| 5 | +date: 2021-03-11 |
| 6 | +docid: 5 |
| 7 | +icon: info-circle |
| 8 | +doctype: tutorial |
| 9 | +tags: |
| 10 | +- tutorial |
| 11 | +description: This guide is about evaluating the suitability of the data for phylogenetic analysis. |
| 12 | +--- |
| 13 | + |
| 14 | +It is important to know that phylogenetic models rely on various simplifying assumptions to |
| 15 | +ease computations. If your data severely violate these assumptions, it might |
| 16 | +cause bias in phylogenetic estimates of tree topologies and other model |
| 17 | +parameters. Some common assumptions include _treelikeness_ (all sites |
| 18 | +in the alignment have evolved under the same tree), _stationarity_ (nucleotide/amino-acid |
| 19 | +frequencies remain constant over time), _reversibility_ (substitutions are equally |
| 20 | +likely in both directions), and _homogeneity_ (substitution rates remain constant over time). |
| 21 | + |
| 22 | +This document shows several ways to check some of these assumptions that you |
| 23 | +should perform before doing phylogenetic analysis. |
| 24 | + |
| 25 | +Likelihood mapping analysis |
| 26 | +--------------------------- |
| 27 | +<div class="hline"></div> |
| 28 | + |
| 29 | +Likelihood mapping ([Strimmer and von Haeseler, 1997]) is a visualisation method |
| 30 | +to display the phylogenetic information of an alignment. It visualises the _treelikeness_ |
| 31 | +of all quartets in a single triangular graph and therefore renders a quick |
| 32 | +interpretation of the phylogenetic content. |
| 33 | + |
| 34 | +A simple likelihood mapping analysis can be conducted with: |
| 35 | + |
| 36 | + iqtree -s example.phy -lmap 2000 -n 0 |
| 37 | + |
| 38 | +where `-lmap` option specify the number of quartets of taxa that will be drawn randomly |
| 39 | +from the alignment. `-n 0` tells IQ-TREE to stop the analysis right after running the |
| 40 | +likelihood mapping. IQ-TREE will print the result in the `.iqtree` report file as well |
| 41 | +as the likelihood mapping plot `.lmap.svg` (in SVG format) and `.lmap.eps` file (in EPS |
| 42 | +figure format). |
| 43 | + |
| 44 | +You can now view the likelihood mapping plot file `example.phy.lmap.svg`, which looks like this: |
| 45 | + |
| 46 | + |
| 47 | + |
| 48 | +It shows phylogenetic information of the alignment `example.phy`. |
| 49 | + |
| 50 | +* Top sub-figure: distribution of quartets depicted by dots on the likelihood mapping plot. |
| 51 | +* Left sub-figure: percentages of quartets falling in each of the three areas. The |
| 52 | + three areas show support for one of the different groupings like (a,b)-(c,d). |
| 53 | +* Right sub-figure: percentages of quartets falling in each of the seven areas. |
| 54 | + Quartets falling into the three corners are informative and called fully-resolved quartets. |
| 55 | + Those in three rectangles are partly informative (partly resolved quartets) and those in the center are uninformative |
| 56 | + (unresolved quartets). A good data set should have high number of fully resolved quartets |
| 57 | + and low number of unresolved quartets. |
| 58 | + |
| 59 | +The meanings can also be found in the `LIKELIHOOD MAPPING STATISTICS` section of the report file `example.phy.iqtree`: |
| 60 | + |
| 61 | + |
| 62 | + LIKELIHOOD MAPPING STATISTICS |
| 63 | + ----------------------------- |
| 64 | + |
| 65 | + (a,b)-(c,d) (a,b)-(c,d) |
| 66 | + /\ /\ |
| 67 | + / \ / \ |
| 68 | + / \ / 1 \ |
| 69 | + / a1 \ / \ / \ |
| 70 | + /\ /\ / \/ \ |
| 71 | + / \ / \ / /\ \ |
| 72 | + / \ / \ / 6 / \ 4 \ |
| 73 | + / \/ \ /\ / 7 \ /\ |
| 74 | + / | \ / \ /______\ / \ |
| 75 | + / a3 | a2 \ / 3 | 5 | 2 \ |
| 76 | + /__________|_________\ /_____|________|_____\ |
| 77 | + (a,d)-(b,c) (a,c)-(b,d) (a,d)-(b,c) (a,c)-(b,d) |
| 78 | + |
| 79 | + Division of the likelihood mapping plots into 3 or 7 areas. |
| 80 | + On the left the areas show support for one of the different groupings |
| 81 | + like (a,b|c,d). |
| 82 | + On the right the right quartets falling into the areas 1, 2 and 3 are |
| 83 | + informative. Those in the rectangles 4, 5 and 6 are partly informative |
| 84 | + and those in the center (7) are not informative. |
| 85 | + ..... |
| 86 | + |
| 87 | + |
| 88 | +The [command reference](Command-Reference#likelihood-mapping-analysis) will provide |
| 89 | +more options and how to perform 2-, 3-, or 4-cluster likelihood mapping analysis. |
| 90 | + |
| 91 | + |
| 92 | +Tests of symmetry |
| 93 | +----------------- |
| 94 | + |
| 95 | +IQ-TREE provides three matched-pairs tests of symmetry ([Naser-Khdour et al., 2019]) to |
| 96 | +test the three assumptions of stationarity, reversibility and homogeneity (SRH). |
| 97 | +A simple analysis: |
| 98 | + |
| 99 | + iqtree2 -s example.phy -p example.nex --symtest-only |
| 100 | + |
| 101 | +will perform the three tests of symmetry on every partition of the alignment |
| 102 | +and print the result into a `.symtest.csv` file. `--symtest-only` option tells |
| 103 | +IQ-TREE to only perform the tests of symmetry and then exit. |
| 104 | +In this example the content of `example.nex.symtest.csv` looks like this: |
| 105 | + |
| 106 | +``` |
| 107 | +# Matched-pair tests of symmetry |
| 108 | +# This file can be read in MS Excel or in R with command: |
| 109 | +# dat=read.csv('example.nex.symtest.csv',comment.char='#') |
| 110 | +# Columns are comma-separated with following meanings: |
| 111 | +# Name: Partition name |
| 112 | +# SymSig: Number of significant sequence pairs by test of symmetry |
| 113 | +# SymNon: Number of non-significant sequence pairs by test of symmetry |
| 114 | +# SymPval: P-value for maximum test of symmetry |
| 115 | +# MarSig: Number of significant sequence pairs by test of marginal symmetry |
| 116 | +# MarNon: Number of non-significant sequence pairs by test of marginal symmetry |
| 117 | +# MarPval: P-value for maximum test of marginal symmetry |
| 118 | +# IntSig: Number of significant sequence pairs by test of internal symmetry |
| 119 | +# IntNon: Number of non-significant sequence pairs by test of internal symmetry |
| 120 | +# IntPval: P-value for maximum test of internal symmetry |
| 121 | +Name,SymSig,SymNon,SymPval,MarSig,MarNon,MarPval,IntSig,IntNon,IntPval |
| 122 | +part1,44,92,0.475639,50,86,0.722371,4,132,0.23869 |
| 123 | +part2,43,93,0.142052,49,87,0.205232,5,131,0.169618 |
| 124 | +part3,53,83,0.00499855,58,78,0.00164132,6,130,0.343127 |
| 125 | +``` |
| 126 | + |
| 127 | +The three important columns are: |
| 128 | + |
| 129 | +* SymPval: a small p-value (say < 0.05) indicates that the assumptions of stationarity |
| 130 | +or homogeneity or both is rejected. In this case, partition `part3` does not comply with these |
| 131 | +two assumptions (p-value = 0.00499855), whereas the other two partitions are "good". |
| 132 | +* MarPval: a small p-value means that the assumption of stationarity is rejected. In |
| 133 | +this case, only partition `part3` does not comply with the stationary condition (p-value = 0.00164132). |
| 134 | +* IntPval: a small p-value means that the homogeneity assumption is reject. In |
| 135 | +this case, no partitions are "bad" according to this test, i.e., they all comply with |
| 136 | +the homogeneity assumption. |
| 137 | + |
| 138 | +This little example shows that only `part3` is problematic by not complying with the |
| 139 | +stationary assumption. |
| 140 | + |
| 141 | +Now you may want to perform the phylogenetic analysis excluding all "bad" partitions by: |
| 142 | + |
| 143 | + iqtree2 -s example.phy -p example.nex --symtest-remove-bad |
| 144 | + |
| 145 | +that will remove all "bad" partitions with SymPval < 0.05 and continue the analysis with the |
| 146 | +remaining "good" partitions. You may then compare the trees from "all" partitions |
| 147 | +and from "good" only partitions to see if there is significant difference between them |
| 148 | +with [tree topology tests](Advanced-Tutorial#tree-topology-tests). |
| 149 | + |
| 150 | +Other options can be seen when running `iqtree2 -h`: |
| 151 | + |
| 152 | +``` |
| 153 | +TEST OF SYMMETRY: |
| 154 | + --symtest Perform three tests of symmetry |
| 155 | + --symtest-only Do --symtest then exist |
| 156 | + --symtest-remove-bad Do --symtest and remove bad partitions |
| 157 | + --symtest-remove-good Do --symtest and remove good partitions |
| 158 | + --symtest-type MAR|INT Use MARginal/INTernal test when removing partitions |
| 159 | + --symtest-pval NUMER P-value cutoff (default: 0.05) |
| 160 | + --symtest-keep-zero Keep NAs in the tests |
| 161 | +``` |
| 162 | + |
| 163 | + |
| 164 | +[Strimmer and von Haeseler, 1997]: http://www.pnas.org/content/94/13/6815.long |
| 165 | +[Naser-Khdour et al., 2019]: https://doi.org/10.1093/gbe/evz193 |
| 166 | + |
0 commit comments