Skip to content

Commit b2a86ed

Browse files
committed
Merge remote-tracking branch 'iqtree3_wiki/master'
2 parents d53e02e + 79ee0e0 commit b2a86ed

70 files changed

Lines changed: 15559 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

doc/Advanced-Tutorial.md

Lines changed: 542 additions & 0 deletions
Large diffs are not rendered by default.

doc/AliSim.md

Lines changed: 675 additions & 0 deletions
Large diffs are not rendered by default.

doc/Analyzing-Big-Data.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
layout: userdoc
3+
title: "Analyzing Big Data"
4+
author: _AUTHOR_
5+
date: _DATE_
6+
docid: 31
7+
icon: info-circle
8+
doctype: manual
9+
tags:
10+
- tutorial
11+
description: "Hints and strategies to analyze big alignments with >= 1000 sequences or >= 10,000 sites."
12+
sections:
13+
---
14+
15+
Analyzing big data
16+
==================
17+
18+
Hints and strategies to analyze big alignments with >= 1000 sequences or >= 10,000 sites.
19+
20+
21+
<!--more-->
22+
23+
24+
TODO
25+
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
layout: userdoc
3+
title: "Assessing Phylogenetic Assumptions"
4+
author: _AUTHOR_
5+
date: _DATE_
6+
docid: 5
7+
icon: info-circle
8+
doctype: tutorial
9+
tags:
10+
- tutorial
11+
description: This guide is about evaluating the suitability of the data for phylogenetic analysis.
12+
sections:
13+
- name: Tests of symmetry
14+
url: tests-of-symmetry
15+
- name: Likelihood mapping
16+
url: likelihood-mapping
17+
18+
---
19+
20+
21+
Assessing phylogenetic assumptions
22+
==================================
23+
24+
It is important to know that phylogenetic models rely on various simplifying assumptions to
25+
ease computations. If your data severely violate these assumptions, it might
26+
cause bias in phylogenetic estimates of tree topologies and other model
27+
parameters. Some common assumptions include _treelikeness_ (all sites
28+
in the alignment have evolved under the same tree), _stationarity_ (nucleotide/amino-acid
29+
frequencies remain constant over time), _reversibility_ (substitutions are equally
30+
likely in both directions), and _homogeneity_ (substitution rates remain constant over time).
31+
32+
This document shows several ways to check some of these assumptions that you
33+
should perform before doing phylogenetic analysis.
34+
35+
Tests of symmetry
36+
-----------------
37+
38+
IQ-TREE provides three matched-pairs tests of symmetry ([Naser-Khdour et al., 2019]) to
39+
test the two assumptions of _stationarity_ and _homogeneity_.
40+
A simple analysis:
41+
42+
iqtree3 -s example.phy -p example.nex --symtest-only
43+
44+
will perform the three tests of symmetry on every partition of the alignment
45+
and print the result into a `.symtest.csv` file. `--symtest-only` option tells
46+
IQ-TREE to only perform the tests of symmetry and then exit.
47+
In this example the content of `example.nex.symtest.csv` looks like this:
48+
49+
```
50+
# Matched-pair tests of symmetry
51+
# This file can be read in MS Excel or in R with command:
52+
# dat=read.csv('example.nex.symtest.csv',comment.char='#')
53+
# Columns are comma-separated with following meanings:
54+
# Name: Partition name
55+
# SymSig: Number of significant sequence pairs by test of symmetry
56+
# SymNon: Number of non-significant sequence pairs by test of symmetry
57+
# SymPval: P-value for maximum test of symmetry
58+
# MarSig: Number of significant sequence pairs by test of marginal symmetry
59+
# MarNon: Number of non-significant sequence pairs by test of marginal symmetry
60+
# MarPval: P-value for maximum test of marginal symmetry
61+
# IntSig: Number of significant sequence pairs by test of internal symmetry
62+
# IntNon: Number of non-significant sequence pairs by test of internal symmetry
63+
# IntPval: P-value for maximum test of internal symmetry
64+
Name,SymSig,SymNon,SymPval,MarSig,MarNon,MarPval,IntSig,IntNon,IntPval
65+
part1,44,92,0.475639,50,86,0.722371,4,132,0.23869
66+
part2,43,93,0.142052,49,87,0.205232,5,131,0.169618
67+
part3,53,83,0.00499855,58,78,0.00164132,6,130,0.343127
68+
```
69+
70+
The three important columns are:
71+
72+
* SymPval: a small p-value (say < 0.05) indicates that the assumptions of stationarity
73+
or homogeneity or both is rejected. In this case, partition `part3` does not comply with these
74+
two assumptions (p-value = 0.00499855), whereas the other two partitions are "good".
75+
* MarPval: a small p-value means that the assumption of stationarity is rejected. In
76+
this case, only partition `part3` does not comply with the stationary condition (p-value = 0.00164132).
77+
* IntPval: a small p-value means that the homogeneity assumption is reject. In
78+
this case, no partitions are "bad" according to this test, i.e., they all comply with
79+
the homogeneity assumption.
80+
81+
This little example shows that only `part3` is problematic by not complying with the
82+
stationary assumption.
83+
84+
Now you may want to perform the phylogenetic analysis excluding all "bad" partitions by:
85+
86+
iqtree3 -s example.phy -p example.nex --symtest-remove-bad
87+
88+
that will remove all "bad" partitions where SymPval < 0.05 and continue the analysis with the
89+
remaining "good" partitions. You may then compare the trees from "all" partitions
90+
and from "good" only partitions to see if there is significant difference between them
91+
with [tree topology tests](Advanced-Tutorial#tree-topology-tests).
92+
93+
Other options can be seen when running `iqtree3 -h`:
94+
95+
```
96+
TEST OF SYMMETRY:
97+
--symtest Perform three tests of symmetry
98+
--symtest-only Do --symtest then exist
99+
--symtest-remove-bad Do --symtest and remove bad partitions
100+
--symtest-remove-good Do --symtest and remove good partitions
101+
--symtest-type MAR|INT Use MARginal/INTernal test when removing partitions
102+
--symtest-pval NUMER P-value cutoff (default: 0.05)
103+
--symtest-keep-zero Keep NAs in the tests
104+
```
105+
106+
107+
Likelihood mapping
108+
------------------
109+
<div class="hline"></div>
110+
111+
Likelihood mapping ([Strimmer and von Haeseler, 1997]) is a visualisation method
112+
to display the phylogenetic information of an alignment. It visualises the _treelikeness_
113+
of all quartets in a single triangular graph and therefore renders a quick
114+
interpretation of the phylogenetic content.
115+
116+
A simple likelihood mapping analysis can be conducted with:
117+
118+
iqtree -s example.phy -lmap 2000 -n 0
119+
120+
where `-lmap` option specify the number of quartets of taxa that will be drawn randomly
121+
from the alignment. `-n 0` tells IQ-TREE to stop the analysis right after running the
122+
likelihood mapping. IQ-TREE will print the result in the `.iqtree` report file as well
123+
as the likelihood mapping plot `.lmap.svg` (in SVG format) and `.lmap.eps` file (in EPS
124+
figure format).
125+
126+
You can now view the likelihood mapping plot file `example.phy.lmap.svg`, which looks like this:
127+
128+
![Likelihood mapping plot.](images/example.phy.lmap.pdf)
129+
130+
It shows phylogenetic information of the alignment `example.phy`.
131+
132+
* Top sub-figure: distribution of quartets depicted by dots on the likelihood mapping plot.
133+
* Left sub-figure: percentages of quartets falling in each of the three areas. The
134+
three areas show support for one of the different groupings like (a,b)-(c,d).
135+
* Right sub-figure: percentages of quartets falling in each of the seven areas.
136+
Quartets falling into the three corners are informative and called fully-resolved quartets.
137+
Those in three rectangles are partly informative (partly resolved quartets) and those in the center are uninformative
138+
(unresolved quartets). A good data set should have high number of fully resolved quartets
139+
and low number of unresolved quartets.
140+
141+
The meanings can also be found in the `LIKELIHOOD MAPPING STATISTICS` section of the
142+
report file `example.phy.iqtree`:
143+
144+
145+
LIKELIHOOD MAPPING STATISTICS
146+
-----------------------------
147+
148+
(a,b)-(c,d) (a,b)-(c,d)
149+
/\ /\
150+
/ \ / \
151+
/ \ / 1 \
152+
/ a1 \ / \ / \
153+
/\ /\ / \/ \
154+
/ \ / \ / /\ \
155+
/ \ / \ / 6 / \ 4 \
156+
/ \/ \ /\ / 7 \ /\
157+
/ | \ / \ /______\ / \
158+
/ a3 | a2 \ / 3 | 5 | 2 \
159+
/__________|_________\ /_____|________|_____\
160+
(a,d)-(b,c) (a,c)-(b,d) (a,d)-(b,c) (a,c)-(b,d)
161+
162+
Division of the likelihood mapping plots into 3 or 7 areas.
163+
On the left the areas show support for one of the different groupings
164+
like (a,b|c,d).
165+
On the right the right quartets falling into the areas 1, 2 and 3 are
166+
informative. Those in the rectangles 4, 5 and 6 are partly informative
167+
and those in the center (7) are not informative.
168+
.....
169+
170+
171+
The [command reference](Command-Reference#likelihood-mapping-analysis) will provide
172+
more options and how to perform 2-, 3-, or 4-cluster likelihood mapping analysis.
173+
174+
175+
[Strimmer and von Haeseler, 1997]: http://www.pnas.org/content/94/13/6815.long
176+
[Naser-Khdour et al., 2019]: https://doi.org/10.1093/gbe/evz193
177+

0 commit comments

Comments
 (0)