Skip to content

Commit c989983

Browse files
committed
update website for v2.2.0 with AliSim
1 parent 2b26c0d commit c989983

11 files changed

Lines changed: 832 additions & 33 deletions

_data/members.yml

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,15 @@ members:
4949
- title: home #use for email address
5050
url: https://bqminh.github.io
5151

52+
- name: Nhan Trong Ly
53+
position:
54+
text: sequence simulations.
55+
img: nhan.jpg
56+
gscholar:
57+
social:
58+
- title: home #use for email address
59+
url: https://bqminh.github.io/people/ly/
60+
5261
- name: Olga Chernomor
5362
position:
5463
text: Partition models and phylogenomic search.
@@ -58,14 +67,14 @@ members:
5867
- title: home #use for email address
5968
url: http://www.cibiv.at/people/olga/
6069

61-
- name: Heiko A. Schmidt
70+
- name: Arndt von Haeseler
6271
position:
63-
text: Integration of <a href="http://www.tree-puzzle.de">TREE-PUZZLE</a> features.
64-
img: hschmidt.png
65-
gscholar: https://scholar.google.com/citations?user=8lFCw7IAAAAJ
72+
text: Inspiring ideas and advice.
73+
img: arndt.jpg
74+
gscholar: https://scholar.google.com/citations?user=XqglmcgAAAAJ
6675
social:
67-
- title: home #use for email address
68-
url: http://www.cibiv.at/people/hschmidt/
76+
- title: envelope #use for email address
77+
url: http://www.cibiv.at/people/haeseler/
6978

7079
- name: Dominik Schrempf
7180
position:
@@ -77,14 +86,14 @@ members:
7786
#url: http://i122server.vu-wien.ac.at/pop/lab_members/Dominik_Schrempf.html
7887
url: https://dschrempf.github.io
7988

80-
- name: Arndt von Haeseler
89+
- name: Heiko A. Schmidt
8190
position:
82-
text: Inspiring ideas and advice.
83-
img: arndt.jpg
84-
gscholar: https://scholar.google.com/citations?user=XqglmcgAAAAJ
91+
text: Integration of <a href="http://www.tree-puzzle.de">TREE-PUZZLE</a> features.
92+
img: hschmidt.png
93+
gscholar: https://scholar.google.com/citations?user=8lFCw7IAAAAJ
8594
social:
86-
- title: envelope #use for email address
87-
url: http://www.cibiv.at/people/haeseler/
95+
- title: home #use for email address
96+
url: http://www.cibiv.at/people/hschmidt/
8897

8998
- name: Diep Thi Hoang
9099
position:

assets/img/members/nhan.jpg

30.8 KB
Loading

doc/AliSim.md

Lines changed: 576 additions & 0 deletions
Large diffs are not rendered by default.

doc/Complex-Models.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
layout: userdoc
33
title: "Complex Models"
4-
author: Dominik Schrempf, Jana Trifinopoulos, Minh Bui
5-
date: 2019-08-08
4+
author: Dominik Schrempf, Jana Trifinopoulos, Minh Bui, Trongnhan Uit
5+
date: 2021-11-02
66
docid: 11
77
icon: book
88
doctype: manual
@@ -116,7 +116,6 @@ Note that this assumes `part1` has standard genetic code. If not, append `CODON`
116116

117117
Having prepared a partition file, one is ready to start a partitioned analysis with `-q` (edge-equal), `-spp` (edge-proportional) or `-sp` (edge-unlinked) option. See [this tutorial](Advanced-Tutorial#partitioned-analysis-for-multi-gene-alignments) for more details.
118118

119-
120119
Mixture models
121120
--------------
122121
<div class="hline"></div>
@@ -283,9 +282,6 @@ The `-wspm` option will generate a `.siteprob` output file. This contains the pr
283282

284283
iqtree -s data.fst -m GTR+FO*H4 -wspm
285284

286-
287-
288-
289285
[Brown et al. (2013)]: https://doi.org/10.1098/rspb.2013.1755
290286
[Lartillot and Philippe, 2004]: https://doi.org/10.1093/molbev/msh112
291287
[Le et al., 2008a]: https://doi.org/10.1093/bioinformatics/btn445
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
---
2+
layout: userdoc
3+
title: "Estimating amino acid substitution models"
4+
author: Cuong Dang
5+
date: 2021-10-16
6+
docid: 8
7+
icon: info-circle
8+
doctype: tutorial
9+
tags:
10+
- tutorial
11+
description: Estimating amino acid substitution models.
12+
sections:
13+
- name: Estimating a model from a single concatenated alignment
14+
url: Estimating a model from a single concatenated alignment
15+
- name: Estimating a model from a folder of alignments
16+
url: Estimating a model from a folder of alignments
17+
---
18+
19+
20+
Estimating amino acid substitution models
21+
==========================
22+
Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups.
23+
In our recent papers ([Dang et al., 2021], [Minh et al., 2021]), we introduced QMaker and nQMaker, both using maximum likelihood approach that respectively allow the estimation of time reversible and time non-reversible amino acid substitution models from a single concatenated alignment or from a set of protein sequence alignments. Both reversible models estimated with QMaker and non-reversible models estimated with nQMaker fit better to empirical alignments than existing models, across a wide range of datasets. Moreover, the non-reversible models recovered correctly the commonly known root placements for the plant and bird trees with high statistical support without the need to use an outgroup.
24+
Six new reversible models (Q.pfam, Q.plant, Q.mammal, Q.bird, Q.insect, Q.yeast) and six new non-reversible models (NQ.pfam, NQ.plant, NQ.mammal, NQ.bird, NQ.insect, NQ.yeast) are implemented in the latest version of IQ-TREE.
25+
This tutorial provides an easy-to-use approach for users to estimate non-reversible models and rooted phylogenies from their own protein datasets. Please make sure
26+
that you use the lateset version of IQ-TREE for full features and cite our manuscripts:
27+
28+
> Bui Quang Minh, Cuong Cao Dang, Le Sy Vinh, and Robert Lanfear (2021), QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution. <https://doi.org/10.1093/sysbio/syab010>
29+
30+
> Cuong Cao Dang, Bui Quang Minh, Hanon McShea, Joanna Masel, Jennifer Eleanor James, Le Sy Vinh, and Robert Lanfear (2021), nQMaker: estimating time non-reversible amino acid substitution models. <https://doi.org/10.1101/2021.10.18.464754>
31+
32+
## Estimating a model from a single concatenated alignment
33+
### Estimating a reversible model
34+
We first demonstrate the estimation of a reversible model for a clade-specific dataset. Please download and extract the [sample training data](data/plant_10loci.zip). This sample data was exctracted from plant data ([Ran et al., 2018]). There are two files in the downloaded folder:
35+
36+
* `alignment.nex` contains the alignment in NEXUS format.
37+
* `train.nex` contains the training partitions in NEXUS format.
38+
39+
The estimation (training) then can be accomplished with three commands in IQ-TREE. The first command is:
40+
41+
# step 1: infer an single edge-linked tree with reversible models as initial models
42+
iqtree2 --seed 1 -T AUTO -s alignment.nex -p train.nex -m MFP -mset LG,WAG,JTT -cmax 4 --prefix train_plant
43+
44+
- `-seed 1` sets the random seed.
45+
- `-T AUTO` sets the number of computing threads to automatically detection, `-T 10` will let IQ-TREE utilize up to 10 CPU threads.
46+
- `-s alignment.nex` specifies the NEXUS file containing the concatenated alignment.
47+
- `-p train.nex` specifies the NEXUS file containing the training loci, `-p` option will estimate an edge-linked partition model with a single tree topology shared across all loci. This -p option is typically used with concatenation tree estimation that assumes a single species tree but rescale the branch lengths of the locus trees. It was shown to perform best among other partition models ([Duchêne et al., 2019]).
48+
Note: If you don't have the partition file `train.nex`, simply ignore the `-p` option.
49+
- `-mset LG,WAG,JTT` defines the initial candidate matrices to reduce computational burden.
50+
- `-cmax 4` restricts up to four categories for the rate heterogeneity across sites.
51+
- `--prefix train_plant` or `-pre train_plant` sets the name of the output files.
52+
53+
This will run ModelFinder to find the best model for each loci. The best models and the best trees will be saved to `train_plant.best_model.nex` and to `train_plant.treefile`, respectively. These two files will be used as the input for the second command that estimate a join reversible matrix:
54+
55+
# step 2: estimate a join non-reversible matrix across all loci
56+
iqtree2 -seed 1 -T AUTO -s alignment.nex -p train_plant.best_model.nex -te train_plant.treefile --init-model LG --model-joint GTR20+FO --prefix plant_GTR20_FO
57+
- `--init-model LG` option specifies the initial matrix
58+
- `-p train_plant.best_model.nex` and `-te train_plant.treefile` options specify the best models and trees found in step 1.
59+
60+
The resulting matrix (`Q.plant`) which is included in the file `plant_GTR20_FO.iqtree` can be obtained with the command:
61+
62+
# step 3: extract the resulting reversible matrix
63+
grep -A 21 "can be used as input for IQ-TREE" plant_GTR20_FO.iqtree | tail -n20 > Q.plant
64+
65+
We can open `Q.plant` in a text viewer as below, or use with IQ-TREE (e.g. `-m Q.plant` option). The `Q.plant` is in PAML format, it contains the lower diagonal part of the AA exchange rates matrix and 20 AA frequencies.
66+
67+
0.120955
68+
0.048462 0.458565
69+
0.436023 0.037551 4.799928
70+
0.470505 1.707355 0.338537 0.000100
71+
0.261114 3.736017 0.605443 0.147890 0.286279
72+
0.837596 0.016734 0.388965 5.815020 0.000100 4.138646
73+
1.939708 1.199364 0.911021 1.413132 0.681747 0.419571 0.886564
74+
0.232927 3.431828 4.195096 1.867861 1.850593 4.826298 0.328092 0.170808
75+
0.118245 0.101767 0.246939 0.041486 0.186430 0.018556 0.000100 0.000100 0.015294
76+
0.137216 0.190052 0.000100 0.000100 0.362263 0.642152 0.028449 0.025460 0.554159 2.930110
77+
0.135127 5.915805 3.499350 0.090416 0.000100 2.539190 1.229355 0.196033 0.301659 0.149798 0.013760
78+
0.359701 0.983896 0.049506 0.000100 0.380313 0.236000 0.389444 0.000100 0.031711 5.188215 4.848803 0.701453
79+
0.203942 0.034729 0.000100 0.000100 1.671509 0.000100 0.000100 0.048554 0.091040 1.015749 3.012478 0.000154 0.873360
80+
1.317722 0.441669 0.000100 0.128632 0.000100 1.357928 0.123957 0.048531 0.824914 0.000100 0.502828 0.122503 0.000100 0.000100
81+
3.425467 1.067776 4.114253 0.435618 2.788606 0.280845 0.104001 1.549784 0.361033 0.041339 0.384561 0.290811 0.188779 0.615963 2.474679
82+
4.260751 0.715157 2.261495 0.178679 0.000100 0.277338 0.227034 0.186275 0.000100 1.909182 0.041874 0.966718 3.385793 0.072074 0.806596 4.938908
83+
0.245284 0.698218 0.000100 0.000100 1.273897 0.161369 0.000100 0.405661 0.536027 0.012605 0.896675 0.000100 0.341461 0.993273 0.031731 0.131122 0.000100
84+
0.012568 0.009986 0.621948 0.200102 2.122422 0.031013 0.010158 0.000100 5.748250 0.031443 0.000100 0.000100 0.156847 7.335297 0.173676 0.241389 0.000100 1.228204
85+
2.630085 0.043992 0.071593 0.042649 0.326353 0.036449 0.277692 0.162678 0.140434 9.733184 1.730443 0.035985 1.921040 0.678498 0.149578 0.089526 0.989870 0.000100 0.234116
86+
87+
0.076028 0.051084 0.039819 0.051496 0.012072 0.038467 0.064409 0.052997 0.017632 0.064148 0.102707 0.073142 0.019406 0.048651 0.034180 0.077354 0.046007 0.011774 0.033201 0.085428
88+
89+
The amino-acid order in this file is:
90+
91+
A R N D C Q E G H I L K M F P S T W Y V
92+
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
93+
94+
95+
### Estimating a non-reversible model
96+
To estimate a non-reversible model, we use `--model-joint NONREV+FO` option instead of `--model-joint GTR20+FO` (in step 2). The three commands now are:
97+
98+
# step 1: infer an single edge-linked tree with reversible models as initial models
99+
iqtree2 --seed 1 -T AUTO -s alignment.nex -p train.nex -m MFP -mset LG,WAG,JTT -cmax 4 --prefix train_plant
100+
101+
# step 2: estimate a join non-reversible matrix across all loci
102+
iqtree2 -seed 1 -T AUTO -s alignment.nex -p train_plant.best_model.nex -te train_plant.treefile --model-joint NONREV+FO --prefix plant_NONREV_FO
103+
104+
# step 3: extract the resulting non-reversible matrix
105+
grep -A 22 "can be used as input for IQ-TREE" plant_NONREV_FO.iqtree | tail -n21 > NQ.plant
106+
107+
The resulting `NQ.plant` matrix may now look like:
108+
109+
-1.061624 0.007785 0.002655 0.022039 0.006589 0.014742 0.062839 0.090131 0.001704 0.002323 0.003087 0.001751 0.012195 0.012809 0.041760 0.320267 0.210023 0.003323 0.000857 0.244743
110+
0.008289 -1.080907 0.020365 0.000987 0.026624 0.177551 0.001802 0.086672 0.075705 0.009088 0.021643 0.491866 0.024868 0.000653 0.006541 0.072228 0.039720 0.013123 0.000986 0.002196
111+
0.004032 0.019786 -1.313687 0.267488 0.005393 0.008927 0.032858 0.067708 0.062829 0.023892 0.000094 0.252395 0.000094 0.000094 0.000094 0.438096 0.096599 0.000094 0.027919 0.005295
112+
0.041666 0.003896 0.227284 -0.890796 0.000094 0.000094 0.400738 0.099378 0.050443 0.002720 0.000094 0.007025 0.000094 0.000094 0.000094 0.027181 0.013001 0.000094 0.009413 0.007395
113+
0.051903 0.094902 0.010786 0.000094 -0.893500 0.026469 0.000094 0.037475 0.106883 0.010559 0.070577 0.000094 0.000094 0.102084 0.000094 0.236907 0.000094 0.012537 0.099632 0.032222
114+
0.013065 0.194182 0.039046 0.017941 0.000305 -1.075698 0.292962 0.027675 0.095402 0.001329 0.081175 0.209181 0.004369 0.000094 0.062599 0.012804 0.017520 0.000094 0.000094 0.005861
115+
0.064413 0.000094 0.012826 0.355911 0.000094 0.181070 -0.822679 0.053530 0.003000 0.000094 0.002512 0.097327 0.005370 0.000094 0.005713 0.002772 0.012020 0.000094 0.000094 0.025650
116+
0.187614 0.053999 0.033263 0.066321 0.009740 0.014380 0.058840 -0.621387 0.003268 0.000094 0.005951 0.015482 0.000094 0.000544 0.000094 0.145278 0.009973 0.000144 0.000094 0.016213
117+
0.033332 0.235841 0.278083 0.091410 0.023465 0.257662 0.049843 0.011802 -1.428426 0.004130 0.053383 0.021902 0.000094 0.007900 0.054172 0.039777 0.000094 0.000094 0.250585 0.014858
118+
0.023950 0.001632 0.004395 0.002772 0.003342 0.000717 0.000094 0.000837 0.000094 -1.493125 0.272227 0.010394 0.124246 0.059446 0.000094 0.003597 0.087965 0.000297 0.003057 0.893969
119+
0.026527 0.011287 0.000094 0.000094 0.002177 0.021633 0.001975 0.000094 0.012033 0.226203 -0.745614 0.000094 0.089496 0.149580 0.006638 0.026123 0.003215 0.013687 0.002033 0.152631
120+
0.017415 0.341123 0.171696 0.006194 0.000094 0.105871 0.089788 0.009665 0.007083 0.009163 0.002472 -0.829488 0.013473 0.000094 0.003097 0.013503 0.038477 0.000094 0.000094 0.000094
121+
0.000094 0.043168 0.007303 0.000094 0.011450 0.011888 0.047858 0.000094 0.004725 0.350677 0.567158 0.061171 -1.532577 0.026166 0.000094 0.027269 0.182166 0.012220 0.014137 0.164845
122+
0.007119 0.004300 0.000094 0.000094 0.023494 0.000094 0.002681 0.008815 0.000094 0.054752 0.348000 0.001664 0.026142 -0.836377 0.000094 0.051269 0.011247 0.006273 0.217177 0.072974
123+
0.115066 0.035090 0.000094 0.011471 0.000094 0.046128 0.005991 0.007020 0.013780 0.000094 0.071513 0.009870 0.000094 0.000094 -0.550036 0.167147 0.048381 0.001156 0.005462 0.011491
124+
0.238188 0.080851 0.127209 0.032161 0.040145 0.018437 0.014191 0.071781 0.006455 0.002646 0.050029 0.035037 0.001082 0.032565 0.125243 -1.151350 0.243892 0.000094 0.012692 0.018650
125+
0.361018 0.036699 0.129824 0.003747 0.000094 0.010391 0.015700 0.012775 0.000630 0.135648 0.002005 0.101363 0.080632 0.000094 0.013000 0.417972 -1.388325 0.000094 0.000094 0.066546
126+
0.014117 0.017973 0.000094 0.000094 0.025311 0.008866 0.000094 0.034917 0.015069 0.000094 0.068188 0.000094 0.000094 0.037263 0.000094 0.016805 0.000094 -0.251399 0.012044 0.000094
127+
0.000094 0.000094 0.010861 0.001684 0.031664 0.000094 0.001058 0.000094 0.132692 0.000094 0.000094 0.000094 0.000094 0.405258 0.006474 0.003214 0.000094 0.007815 -0.601659 0.000094
128+
0.203068 0.005150 0.004281 0.000094 0.003602 0.000454 0.018986 0.008201 0.003052 0.638676 0.198251 0.009524 0.040070 0.033033 0.008064 0.000094 0.062714 0.000094 0.007267 -1.244676
129+
130+
0.076646 0.049413 0.038372 0.049451 0.010780 0.037824 0.063761 0.052468 0.015186 0.065770 0.104298 0.072672 0.019435 0.049968 0.035179 0.078294 0.045874 0.013490 0.033377 0.087742
131+
132+
Note: To assess the statistical support of the root position with bootstraping (-B 1000 option), users can use [this tutorial](Rootstrap).
133+
134+
## Estimating a model from a folder of alignments
135+
### Estimating a reversible model
136+
We will now estimate a reversible model from a folder of alignments. Please first download the file [plant_10alignments.zip](data/plant_10alignments.zip). There is a sub-folder named `train_plant` in the downloaded folder. We use `-S` option instead of `-s` and `-p` options to allow each alignment having a separate tree. This -S option is typically used with a folder of alignments. The three commands are:
137+
138+
# step 1: infer a separate tree for each alignment with reversible models as initial models
139+
iqtree2 -seed 1 -T AUTO -S train_plant -mset LG,WAG,JTT -cmax 4 -pre train_plant
140+
141+
# step 2: estimate a join reversible matrix across all alignments
142+
iqtree2 -seed 1 -T AUTO -S train_plant.best_model.nex -te train_plant.treefile --model-joint GTR20+FO --init-model LG -pre train_plant.GTR20
143+
144+
# step 3: extract the resulting reversible matrix
145+
grep -A 21 "can be used as input for IQ-TREE" train_plant.GTR20.iqtree | tail -n20 > Q.plant
146+
147+
### Estimating a non-reversible model
148+
Same as estimating a non-reversible model from a single concatenated alignment, we change `--model-joint GTR20+FO` option (in step 2) to `--model-joint NONREV+FO`.
149+
150+
# step 1: infer a separate tree for each alignment with reversible models as initial models
151+
iqtree2 -seed 1 -T AUTO -S train_plant -mset LG,WAG,JTT -cmax 4 -pre train_plant
152+
153+
# step 2: estimate a join non-reversible matrix across all alignments
154+
iqtree2 -seed 1 -T AUTO -S train_plant.best_model.nex -te train_plant.treefile --model-joint NONREV+FO --init-model LG -pre train_plant.NONREV
155+
156+
# step 3: extract the resulting non-reversible matrix
157+
grep -A 22 "can be used as input for IQ-TREE" train_plant.NONREV.iqtree | tail -n21 > NQ.plant
158+
159+
[Dang et al., 2021]: https://doi.org/10.1101/2021.10.18.464754
160+
[Minh et al., 2021]: https://doi.org/10.1093/sysbio/syab010
161+
[Naser-Khdour et al., 2021]: https://doi.org/10.1093/sysbio/syab067
162+
[El-Gebali et al., 2018]: https://doi.org/10.1093/nar/gky995
163+
[Duchêne et al., 2019]: https://doi.org/10.1093/molbev/msz291
164+
[Ran et al., 2018]: https://doi.org/10.1098/rspb.2018.1012

0 commit comments

Comments
 (0)