Skip to content

Commit d4485f5

Browse files
authored
Merge branch 'iqtree:master' into master
2 parents 71908da + a09fea4 commit d4485f5

3 files changed

Lines changed: 274 additions & 279 deletions

File tree

doc/Dating.md

Lines changed: 30 additions & 279 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
layout: userdoc
33
title: "Phylogenetic Dating"
4-
author: Minh Bui, Piyumal Demotte, Rob Lanfear
5-
date: 2025-02-27
4+
author: Piyumal Demotte, Minh Bui, Rob Lanfear
5+
date: 2025-05-05
66
docid: 7
77
icon: info-circle
88
doctype: tutorial
@@ -42,14 +42,14 @@ Phylogenetic Dating
4242
Bayesian dating with MCMCtree
4343
------------------------------------------------------------
4444

45-
From IQ-TREE 2.5 onwards, we provide the functionality in IQ-TREE to infer time trees
45+
From IQ-TREE version 3.0.1 onwards, we provide the functionality in IQ-TREE to infer time trees
4646
using Bayesian MCMCtree method.
4747

4848
If you use this feature, please cite:
4949

50-
> __P. Demotte, M. Panchaksaram, N. Ly-Trong, M. dos Reis and B.Q. Minh__
50+
> __P. Demotte, M. Panchaksaram, H. Kumarasinghe, N. Ly-Trong, M. dos Reis and B.Q. Minh__
5151
>(2025) IQ2MC: A New Framework to Infer Phylogenetic Time Trees Using IQ-TREE
52-
>and MCMCtree.
52+
>and MCMCtree with Mixture Models. Submitted.
5353
5454
IQ2MC workflow for time tree inference
5555
--------------------------------------
@@ -90,7 +90,7 @@ If the alignment file is called `example.phy` and the rooted tree file is called
9090
`example_tree.nwk`,
9191

9292
```
93-
iqtree -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --prefix example
93+
iqtree3 -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --prefix example
9494
```
9595

9696

@@ -118,7 +118,7 @@ You can specify more parameters in the workflow to generate the control file
118118
accurately for the analysis with IQ-TREE.
119119

120120
```
121-
iqtree -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --mcmc-iter 20000,200,50000 --mcmc-bds 1,1,0.5 --mcmc-clock IND
121+
iqtree3 -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --mcmc-iter 20000,200,50000 --mcmc-bds 1,1,0.5 --mcmc-clock IND
122122
```
123123

124124
* `--mcmc-iter burnin,samplefreq,nsample` : use to set number of burin samples,
@@ -134,56 +134,35 @@ Currently supported clocks models are EQUAL: global clock with equal rates, IND:
134134
independent rates model with independent rates across lineages and CORR:
135135
correlated clock model with auto-correlated rates across the lineages.
136136

137-
Using partitions and Mixture models for approximate likelihood dating
137+
Using partitions and mixture models for approximate likelihood dating
138138
---------------------------------------------------------------------
139139

140140
IQ-TREE supports three partition models for approximate likelihood dating. Under
141-
the Edge-unlinked (EUL) model, IQ-TREE generates the Hessian file which contains
141+
the Edge-unlinked (EUL) model (`-Q` option), IQ-TREE generates the Hessian file which contains
142142
separate gradients and Hessian for each partition. For the Edge-linked (EL)
143-
partition model, the Hessian file contains only one gradient vector and a
144-
Hessian as branches are shared across partitions.
145-
146-
Since IQ-TREE supports RAxML and NEXUS style partitions input file, you can use
147-
partitions defined in the following format.
148-
149-
```
150-
DNA, part1 = 1-100
151-
DNA, part2 = 101-450
152-
```
153-
If your partition file is called `example.nex`,
154-
155-
```
156-
iqtree -s example.phy -Q example.nex -m GTR+G4 -te example_tree.nwk --dating mcmctree
157-
```
158-
159-
Here, IQ-TREE generates the Hessian file using the `GTR+G4` model for all
160-
partitions. If you need to use different models for each partition, you need to
161-
create a more flexible NEXUS file like the following.
143+
partition model (`-p` option), the Hessian file contains only one gradient vector and a
144+
Hessian as branches are shared across partitions. See [Complex Models](Complex-Models)
145+
for how to specify partition and mixture models. If your partition file is called `example.nex`:
162146

163147
```
164-
#nexus
165-
begin sets;
166-
charset part1 = 1-100;
167-
charset part2 = 101-450;
168-
charpartition mine = GTR+G4:part1, HKY:part2;
169-
end;
148+
# -Q option is to specify egde-unlinked partition model
149+
iqtree3 -s example.phy -Q example.nex -m GTR+G4 -te example_tree.nwk --dating mcmctree
170150
```
171-
Here, IQ-TREE uses `GTR+G4` model for partition 1, and `HKY` model for partition
172-
2 respectively. Using `-q` and `-p` options, you can generate the Hessian file
173-
which considers `edge-linked equal branch partition models` and `edge-linked
174-
proportional branch length models` respectively.
175151

176152
IQ-TREE also supports mixture models for the Hessian file generation. You can
177153
simply specify DNA or Amino Acid Mixture model as following,
178154

179155
```
180-
iqtree -s example.phy -m "MIX{GTR,HKY}+G4" -te example_tree.nwk –-dating mcmctree
156+
iqtree3 -s example.phy -m "MIX{GTR,HKY}+G4" -te example_tree.nwk –-dating mcmctree
181157
```
158+
(Or you can also invoke [MixtureFinder](Complex-Models#mixturefinder) with `-m MIX+MFP` to determine mixture models automatically).
159+
182160
If you need to use an Amino Acid profile mixture model such as C60 model,
183161

184162
```
185-
iqtree -s example.phy -m LG+G4+C60 -te example_tree.nwk –-dating mcmctree
163+
iqtree3 -s example_aa.phy -m LG+G4+C60 -te example_aa_tree.nwk –-dating mcmctree
186164
```
165+
187166
If you are using ModelFinder or MixtureFinder, you need to follow a two-step
188167
approach. First, you can estimate the best-fit model for the data using
189168
ModelFinder or MixtureFinder. Then, the Hessian file can be generated using
@@ -192,17 +171,19 @@ ModelFinder or MixtureFinder. Then, the Hessian file can be generated using
192171
How to run MCMCtree
193172
-------------------
194173

195-
You can directly run MCMCtree from the control file generated by IQ-TREE in step
196-
2. The command to run MCMCtree with the control file is,
174+
You need to download a modified version of MCMCTree from <https://github.com/iqtree/paml>.
175+
This version has some changes to make the workflow more convenient.
176+
You can then directly run MCMCtree from the control file generated by IQ-TREE in step 2.
177+
The command to run MCMCtree with the control file is:
197178

198179
```
199180
mcmctree example.mcmctree.ctl
200181
```
201182

202183

203-
The control file generated by IQ-TREE has the following format. You can simply
204-
edit the control file as necessary. For an example you may need to increase
205-
burin and sample frequency for MCMC convergence.
184+
The control file generated by IQ-TREE has the following format. You can
185+
edit the control file before running `mcmctree` as necessary. For example, you can increase
186+
burnin and sample frequency for MCMC convergence.
206187

207188
```
208189
seed = -1 * The computer’s current time is used when seed < 0.
@@ -252,8 +233,8 @@ alpha_gamma = 1 1 * alpha and beta parameter of Gamma distribution for heter
252233

253234
Note that, if you generate the `hessain file` from IQ-TREE, it is necessary to
254235
use the rooted tree file generated by IQ-TREE to be used in MCMCtree. The
255-
`ckpfile` and `hessianfile` options are new and only work for the PAML release
256-
in IQ-TREE (https://github.com/iqtree/paml). If you use another MCMCtree
236+
`ckpfile` and `hessianfile` options are new and only work with our modified
237+
[PAML code](https://github.com/iqtree/paml). If you use another MCMCtree
257238
version/release, you can simply remove those options from control file and
258239
rename the `hessian file` to `in.BV` to run MCMCtree without any errors.
259240

@@ -471,236 +452,6 @@ to control the way that LSD2 treats outliers, you can do this:
471452
iqtree -s ALN_FILE --date DATE_FILE --date-options "-e 2"
472453

473454
A full list of the options for LSD2 can be obtained by downloading LSD2 and
474-
running `lsd2 -h`, the output of that command is reproduced here for
475-
convenience:
455+
running `lsd2 -h`, the output of that command is [provided here](lsd2-help) for
456+
your convenience.
476457

477-
```
478-
LSD: LEAST-SQUARES METHODS TO ESTIMATE RATES AND DATES - v.1.8
479-
480-
DESCRIPTION
481-
This program estimates the rate and the dates of the input phylogenies given
482-
some temporal constraints.
483-
It minimizes the square errors of the branch lengths under normal
484-
distribution model.
485-
486-
SYNOPSIS
487-
./lsd [-i inputFile] [-d inputDateFile] [-o outputFile] [-s sequenceLength]
488-
[-g outgroupFile] [-f nbSamplings]
489-
OPTIONS
490-
-a rootDate
491-
To specify the root date if there's any. If the root date is not a
492-
number, but a string (ex: 2020-01-10, or b(2019,2020)) then it should
493-
be put between the quotes.
494-
-b varianceParameter
495-
The parameter (between 0 and 1) to compute the variances in option -v. It
496-
is the pseudo positive constant to add to the branch lengths
497-
when calculating variances, to adjust the dependency of variances to
498-
branch lengths. By default b is the maximum between median branch length
499-
and 10/seqlength; but it should be adjusted based on how/whether the
500-
input tree is relaxed or strict. The smaller it is the more variances
501-
would be linear to branch lengths, which is relevant for strict clock.
502-
The bigger it is the less effect of branch lengths on variances,
503-
which might be better for relaxed clock.
504-
-d inputDateFile
505-
This options is used to read the name of the input date file which
506-
contains temporal constraints of internal nodes
507-
or tips. An internal node can be defined either by its label (given in
508-
the input tree) or by a subset of tips that have it as
509-
the most recent common ancestor (mrca). A date could be a real or a
510-
string or format year-month-day.
511-
The first line of this file is the number of temporal constraints. A
512-
temporal constraint can be fixed date, or a
513-
lower bound l(value), or an upper bound u(value), or an interval b(v1,v2)
514-
For example, if the input tree has 4 taxa a,b,c,d, and an internal node
515-
named n, then following is a possible date file:
516-
6
517-
a l(2003.12)
518-
b u(2007.07)
519-
c 2005
520-
d b(2001.2,2007.11)
521-
mrca(a,b,c,d) b(2000,2001)
522-
n l(2004.3)
523-
If this option is omitted, and option -a, -z are also omitted, the
524-
program will estimate relative dates by giving T[root]=0 and T[tips]=1.
525-
-D outDateFormat
526-
Specify output date format: 1 for real, 2 for year-month-day. By default
527-
the program will guess the format of input dates and uses it for
528-
output dates.
529-
-e ZscoreOutlier
530-
This option is used to estimate and exclude outlier nodes before dating
531-
process.
532-
LSD2 normalize the branch residus and decide a node is outlier if its
533-
related residus is great than the ZscoreOutlier.
534-
A normal value of ZscoreOutliercould be 3, but you can adjust it
535-
bigger/smaller depending if you want to have
536-
less/more outliers. Note that for now, some functionalities could not be
537-
combined with outliers estimation, for example
538-
estimating multiple rates, imprecise date constraints.
539-
-f samplingNumberCI
540-
This option calculates the confidence intervals of the estimated rate and
541-
dates. The branch lengths of the esimated
542-
tree are sampled samplingNumberCI times to generate a set of simulated
543-
trees. To generate simulated lengths
544-
for each branch, we use a Poisson distribution whose mean equals to the
545-
estimated one multiplied by the sequence length, which is
546-
1000 by default if nothing was specified via option -s. Long sequence
547-
length tends to give small confidence intervals. To avoid
548-
over-estimate the confidence intervals in the case of very long sequence
549-
length but not necessarily strict molecular clock, you
550-
could use a smaller sequence length than the actual ones. Confidence
551-
intervals are written in the nexus tree with label CI_height,
552-
and can be visualzed with Figtree under Node bar feature.
553-
-g outgroupFile
554-
If your data contain outgroups, then specify the name of the outgroup
555-
file here. The program will use the outgroups to root the trees.
556-
If you use this combined with options -G, then the outgroups will be
557-
removed. The format of this file should be:
558-
n
559-
OUTGROUP1
560-
OUTGROUP2
561-
...
562-
OUTGROUPn
563-
-F
564-
By default without this option, we impose the constraints that the date
565-
of every node is equal or smaller then the
566-
dates of its descendants, so the running time is quasi-linear. Using this
567-
option we ignore this temporal constraints, and
568-
the the running time becomes linear, much faster.
569-
-h help
570-
Print this message.
571-
-i inputTreesFile
572-
The name of the input trees file. It contains tree(s) in newick format,
573-
each tree on one line. Note that the taxa sets of all
574-
trees must be the same.
575-
-j
576-
Verbose mode for output messages.
577-
-G
578-
Use this option to remove the outgroups (given in option -g) in the
579-
estimated tree. If this option is not used, the outgroups
580-
will be kept and the root position in estimated on the branch defined by
581-
the outgroups.
582-
-l nullBlen
583-
A branch in the input tree is considered informative if its length is
584-
greater this value. By default it is 0.5/seq_length. Only
585-
informative branches are forced to be bigger than a minimum branch length
586-
(see option -u for more information about this).
587-
-m samplingNumberOutlier
588-
The number of dated nodes to be sampled when detecting outlier nodes.
589-
This should be smaller than the number of dated nodes,
590-
and is 10 by default.
591-
-n datasetNumber
592-
The number of trees that you want to read and analyse.
593-
-o outputFile
594-
The base name of the output files to write the results and the time-scale
595-
trees.
596-
-p partitionFile
597-
The file that defines the partition of branches into multiple subsets in
598-
the case that you know each subset has a different rate.
599-
In the partition file, each line contains the name of the group, the
600-
prior proportion of the group rate compared to the main rate
601-
(selecting an appropriate value for this helps to converge faster), and a
602-
list of subtrees whose branches are supposed to have the
603-
same substitution rate. All branches that are not assigned to any subtree
604-
form a group having another rate.
605-
A subtree is defined between {}: its first node corresponds to the root
606-
of the subtree, and the following nodes (if there any)
607-
correspond to the tips of the subtree. If the first node is a tip label
608-
then it takes the mrca of all tips as the root of the subtree.
609-
If the tips of the subtree are not defined (so there's only the defined
610-
root), then by
611-
default this subtree is extended down to the tips of the full tree. For
612-
example the input tree is
613-
((A:0.12,D:0.12)n1:0.3,((B:0.3,C:0.5)n2:0.4,(E:0.5,(F:0.2,G:0.3)n3:0.33)
614-
n4:0.22)n5:0.2)root;
615-
and you have the following partition file:
616-
group1 1 {n1} {n5 n4}
617-
group2 1 {n3}
618-
then there are 3 rates: the first one includes the branches (n1,A),
619-
(n1,D), (n5,n4), (n5,n2), (n2,B), (n2,C); the second one
620-
includes the branches (n3,F), (n3,G), and the last one includes all the
621-
remaining branches. If the internal nodes don't have labels,
622-
then they can be defined by mrca of at least two tips, for example n1 is
623-
mrca(A,D)
624-
-q standardDeviationRelaxedClock
625-
This value is involved in calculating confidence intervals to simulate a
626-
lognormal relaxed clock. We multiply the simulated branch lengths
627-
with a lognormal distribution with mean 1, and standard deviation q. By
628-
default q is 0.2. The bigger q is, the more your tree is relaxed
629-
and give you bigger confidence intervals.
630-
-r rootingMethod
631-
This option is used to specify the rooting method to estimate the
632-
position of the root for unrooted trees, or
633-
re-estimate the root for rooted trees. The principle is to search for the
634-
position of the root that minimizes
635-
the objective function.
636-
Use -r l if your tree is rooted, and you want to re-estimate the root
637-
locally around the given root.
638-
Use -r a if you want to estimate the root on all branches (ignoring the
639-
given root if the tree is rooted).
640-
In this case, if the constrained mode is chosen (option -c), method
641-
"a" first estimates the root without using the constraints.
642-
After that, it uses the constrained mode to improve locally the
643-
position of the root around this pre-estimated root.
644-
Use -r as if you want to estimate to root using constrained mode on all
645-
branches.
646-
Use -r k if you want to re-estimate the root position on the same branche
647-
of the given root.
648-
If combined with option -g, the root will be estimated on the branche
649-
defined by the outgroups.
650-
-R round_time
651-
This value is used to round the minimum branch length of the time scaled
652-
tree. The purpose of this is to make the minimum branch length
653-
a meaningful time unit, such as day, week, year ... By default this value
654-
is 365, so if the input dates are year, the minimum branch
655-
length is rounded to day. The rounding formula is round(R*minblen)/R.
656-
-s sequenceLength
657-
This option is used to specify the sequence length when estimating
658-
confidence intervals (option -f). It is used to generate
659-
integer branch lengths (number of substitutions) by multiplying this with
660-
the estimated branch lengths. By default it is 1000.
661-
-S minSupport
662-
Together with collapsing internal short branches (see option -l), users
663-
can also collapse internal branches having weak support values (if
664-
provided in the input tree) by using this option. The program will
665-
collapse all internal branches having support <= the specifed value.
666-
-t rateLowerBound
667-
This option corresponds to the lower bound for the estimating rate. It is
668-
1e-10 by default.
669-
-u minBlen
670-
By default without this option, lsd2 forces every branch of the time
671-
scaled tree to be greater than 1/(seq_length*rate) where rate is
672-
an pre-estimated median rate. This value is rounded to the number of days
673-
or weeks or years, depending on the rounding parameter -R.
674-
By using option -u, the program will not estimate the minimum branch
675-
length but use the specified value instead.
676-
-U minExBlen
677-
Similar to option -u but applies for external branches if specified. If
678-
it's not specified then the minimum branch length of external
679-
branches is set the same as the one of internal branch.
680-
-v variance
681-
Use this option to specify the way you want to apply variances for the
682-
branch lengths. Variances are used to recompense big errors on
683-
long estimated branch lengths. The variance of the branch Bi is Vi =
684-
(Bi+b) where b is specified by option -b.
685-
If variance=0, then we don't use variance. If variance=1, then LSD uses
686-
the input branch lengths to calculate variances.
687-
If variance=2, then LSD runs twice where the second time it calculates
688-
the variances based on the estimated branch
689-
lengths of the first run. By default variance=1.
690-
-V
691-
Get the actual version.
692-
-w givenRte
693-
This option is used to specify the name of the file containing the
694-
substitution rates.
695-
In this case, the program will use the given rates to estimate the dates
696-
of the nodes.
697-
This file should have the following format
698-
RATE1
699-
RATE2
700-
...
701-
where RATEi is the rate of the tree i in the inputTreesFile.
702-
-z tipsDate
703-
To specify the tips date if they are all equal. If the tips date is not a
704-
number, but a string (ex: 2020-01-10, or b(2019,2020))
705-
then it should be put between the quotes.
706-
```

doc/iqtree-doc.pdf

-49.1 KB
Binary file not shown.

0 commit comments

Comments
 (0)