11---
22layout : userdoc
33title : " Phylogenetic Dating"
4- author : Minh Bui, Piyumal Demotte , Rob Lanfear
5- date : 2025-02-27
4+ author : Piyumal Demotte, Minh Bui , Rob Lanfear
5+ date : 2025-05-05
66docid : 7
77icon : info-circle
88doctype : tutorial
@@ -42,14 +42,14 @@ Phylogenetic Dating
4242Bayesian dating with MCMCtree
4343------------------------------------------------------------
4444
45- From IQ-TREE 2.5 onwards, we provide the functionality in IQ-TREE to infer time trees
45+ From IQ-TREE version 3.0.1 onwards, we provide the functionality in IQ-TREE to infer time trees
4646using Bayesian MCMCtree method.
4747
4848If you use this feature, please cite:
4949
50- > __ P. Demotte, M. Panchaksaram, N. Ly-Trong, M. dos Reis and B.Q. Minh__
50+ > __ P. Demotte, M. Panchaksaram, H. Kumarasinghe, N. Ly-Trong, M. dos Reis and B.Q. Minh__
5151> (2025) IQ2MC: A New Framework to Infer Phylogenetic Time Trees Using IQ-TREE
52- > and MCMCtree.
52+ > and MCMCtree with Mixture Models. Submitted .
5353
5454IQ2MC workflow for time tree inference
5555--------------------------------------
@@ -90,7 +90,7 @@ If the alignment file is called `example.phy` and the rooted tree file is called
9090` example_tree.nwk ` ,
9191
9292```
93- iqtree -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --prefix example
93+ iqtree3 -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --prefix example
9494```
9595
9696
@@ -118,7 +118,7 @@ You can specify more parameters in the workflow to generate the control file
118118accurately for the analysis with IQ-TREE.
119119
120120```
121- iqtree -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --mcmc-iter 20000,200,50000 --mcmc-bds 1,1,0.5 --mcmc-clock IND
121+ iqtree3 -s example.phy -m GTR+G4 -te example_tree.nwk --dating mcmctree --mcmc-iter 20000,200,50000 --mcmc-bds 1,1,0.5 --mcmc-clock IND
122122```
123123
124124* ` --mcmc-iter burnin,samplefreq,nsample ` : use to set number of burin samples,
@@ -134,56 +134,35 @@ Currently supported clocks models are EQUAL: global clock with equal rates, IND:
134134independent rates model with independent rates across lineages and CORR:
135135correlated clock model with auto-correlated rates across the lineages.
136136
137- Using partitions and Mixture models for approximate likelihood dating
137+ Using partitions and mixture models for approximate likelihood dating
138138---------------------------------------------------------------------
139139
140140IQ-TREE supports three partition models for approximate likelihood dating. Under
141- the Edge-unlinked (EUL) model, IQ-TREE generates the Hessian file which contains
141+ the Edge-unlinked (EUL) model ( ` -Q ` option) , IQ-TREE generates the Hessian file which contains
142142separate gradients and Hessian for each partition. For the Edge-linked (EL)
143- partition model, the Hessian file contains only one gradient vector and a
144- Hessian as branches are shared across partitions.
145-
146- Since IQ-TREE supports RAxML and NEXUS style partitions input file, you can use
147- partitions defined in the following format.
148-
149- ```
150- DNA, part1 = 1-100
151- DNA, part2 = 101-450
152- ```
153- If your partition file is called ` example.nex ` ,
154-
155- ```
156- iqtree -s example.phy -Q example.nex -m GTR+G4 -te example_tree.nwk --dating mcmctree
157- ```
158-
159- Here, IQ-TREE generates the Hessian file using the ` GTR+G4 ` model for all
160- partitions. If you need to use different models for each partition, you need to
161- create a more flexible NEXUS file like the following.
143+ partition model (` -p ` option), the Hessian file contains only one gradient vector and a
144+ Hessian as branches are shared across partitions. See [ Complex Models] ( Complex-Models )
145+ for how to specify partition and mixture models. If your partition file is called ` example.nex ` :
162146
163147```
164- #nexus
165- begin sets;
166- charset part1 = 1-100;
167- charset part2 = 101-450;
168- charpartition mine = GTR+G4:part1, HKY:part2;
169- end;
148+ # -Q option is to specify egde-unlinked partition model
149+ iqtree3 -s example.phy -Q example.nex -m GTR+G4 -te example_tree.nwk --dating mcmctree
170150```
171- Here, IQ-TREE uses ` GTR+G4 ` model for partition 1, and ` HKY ` model for partition
172- 2 respectively. Using ` -q ` and ` -p ` options, you can generate the Hessian file
173- which considers ` edge-linked equal branch partition models ` and `edge-linked
174- proportional branch length models` respectively.
175151
176152IQ-TREE also supports mixture models for the Hessian file generation. You can
177153simply specify DNA or Amino Acid Mixture model as following,
178154
179155```
180- iqtree -s example.phy -m "MIX{GTR,HKY}+G4" -te example_tree.nwk –-dating mcmctree
156+ iqtree3 -s example.phy -m "MIX{GTR,HKY}+G4" -te example_tree.nwk –-dating mcmctree
181157```
158+ (Or you can also invoke [ MixtureFinder] ( Complex-Models#mixturefinder ) with ` -m MIX+MFP ` to determine mixture models automatically).
159+
182160If you need to use an Amino Acid profile mixture model such as C60 model,
183161
184162```
185- iqtree -s example .phy -m LG+G4+C60 -te example_tree .nwk –-dating mcmctree
163+ iqtree3 -s example_aa .phy -m LG+G4+C60 -te example_aa_tree .nwk –-dating mcmctree
186164```
165+
187166If you are using ModelFinder or MixtureFinder, you need to follow a two-step
188167approach. First, you can estimate the best-fit model for the data using
189168ModelFinder or MixtureFinder. Then, the Hessian file can be generated using
@@ -192,17 +171,19 @@ ModelFinder or MixtureFinder. Then, the Hessian file can be generated using
192171How to run MCMCtree
193172-------------------
194173
195- You can directly run MCMCtree from the control file generated by IQ-TREE in step
196- 2 . The command to run MCMCtree with the control file is,
174+ You need to download a modified version of MCMCTree from < https://github.com/iqtree/paml > .
175+ This version has some changes to make the workflow more convenient.
176+ You can then directly run MCMCtree from the control file generated by IQ-TREE in step 2.
177+ The command to run MCMCtree with the control file is:
197178
198179```
199180mcmctree example.mcmctree.ctl
200181```
201182
202183
203- The control file generated by IQ-TREE has the following format. You can simply
204- edit the control file as necessary. For an example you may need to increase
205- burin and sample frequency for MCMC convergence.
184+ The control file generated by IQ-TREE has the following format. You can
185+ edit the control file before running ` mcmctree ` as necessary. For example, you can increase
186+ burnin and sample frequency for MCMC convergence.
206187
207188```
208189seed = -1 * The computer’s current time is used when seed < 0.
@@ -252,8 +233,8 @@ alpha_gamma = 1 1 * alpha and beta parameter of Gamma distribution for heter
252233
253234Note that, if you generate the ` hessain file ` from IQ-TREE, it is necessary to
254235use the rooted tree file generated by IQ-TREE to be used in MCMCtree. The
255- ` ckpfile ` and ` hessianfile ` options are new and only work for the PAML release
256- in IQ-TREE (https://github.com/iqtree/paml ). If you use another MCMCtree
236+ ` ckpfile ` and ` hessianfile ` options are new and only work with our modified
237+ [ PAML code ] ( https://github.com/iqtree/paml ) . If you use another MCMCtree
257238version/release, you can simply remove those options from control file and
258239rename the ` hessian file ` to ` in.BV ` to run MCMCtree without any errors.
259240
@@ -471,236 +452,6 @@ to control the way that LSD2 treats outliers, you can do this:
471452 iqtree -s ALN_FILE --date DATE_FILE --date-options "-e 2"
472453
473454A full list of the options for LSD2 can be obtained by downloading LSD2 and
474- running ` lsd2 -h ` , the output of that command is reproduced here for
475- convenience:
455+ running ` lsd2 -h ` , the output of that command is [ provided here] ( lsd2-help ) for
456+ your convenience.
476457
477- ```
478- LSD: LEAST-SQUARES METHODS TO ESTIMATE RATES AND DATES - v.1.8
479-
480- DESCRIPTION
481- This program estimates the rate and the dates of the input phylogenies given
482- some temporal constraints.
483- It minimizes the square errors of the branch lengths under normal
484- distribution model.
485-
486- SYNOPSIS
487- ./lsd [-i inputFile] [-d inputDateFile] [-o outputFile] [-s sequenceLength]
488- [-g outgroupFile] [-f nbSamplings]
489- OPTIONS
490- -a rootDate
491- To specify the root date if there's any. If the root date is not a
492- number, but a string (ex: 2020-01-10, or b(2019,2020)) then it should
493- be put between the quotes.
494- -b varianceParameter
495- The parameter (between 0 and 1) to compute the variances in option -v. It
496- is the pseudo positive constant to add to the branch lengths
497- when calculating variances, to adjust the dependency of variances to
498- branch lengths. By default b is the maximum between median branch length
499- and 10/seqlength; but it should be adjusted based on how/whether the
500- input tree is relaxed or strict. The smaller it is the more variances
501- would be linear to branch lengths, which is relevant for strict clock.
502- The bigger it is the less effect of branch lengths on variances,
503- which might be better for relaxed clock.
504- -d inputDateFile
505- This options is used to read the name of the input date file which
506- contains temporal constraints of internal nodes
507- or tips. An internal node can be defined either by its label (given in
508- the input tree) or by a subset of tips that have it as
509- the most recent common ancestor (mrca). A date could be a real or a
510- string or format year-month-day.
511- The first line of this file is the number of temporal constraints. A
512- temporal constraint can be fixed date, or a
513- lower bound l(value), or an upper bound u(value), or an interval b(v1,v2)
514- For example, if the input tree has 4 taxa a,b,c,d, and an internal node
515- named n, then following is a possible date file:
516- 6
517- a l(2003.12)
518- b u(2007.07)
519- c 2005
520- d b(2001.2,2007.11)
521- mrca(a,b,c,d) b(2000,2001)
522- n l(2004.3)
523- If this option is omitted, and option -a, -z are also omitted, the
524- program will estimate relative dates by giving T[root]=0 and T[tips]=1.
525- -D outDateFormat
526- Specify output date format: 1 for real, 2 for year-month-day. By default
527- the program will guess the format of input dates and uses it for
528- output dates.
529- -e ZscoreOutlier
530- This option is used to estimate and exclude outlier nodes before dating
531- process.
532- LSD2 normalize the branch residus and decide a node is outlier if its
533- related residus is great than the ZscoreOutlier.
534- A normal value of ZscoreOutliercould be 3, but you can adjust it
535- bigger/smaller depending if you want to have
536- less/more outliers. Note that for now, some functionalities could not be
537- combined with outliers estimation, for example
538- estimating multiple rates, imprecise date constraints.
539- -f samplingNumberCI
540- This option calculates the confidence intervals of the estimated rate and
541- dates. The branch lengths of the esimated
542- tree are sampled samplingNumberCI times to generate a set of simulated
543- trees. To generate simulated lengths
544- for each branch, we use a Poisson distribution whose mean equals to the
545- estimated one multiplied by the sequence length, which is
546- 1000 by default if nothing was specified via option -s. Long sequence
547- length tends to give small confidence intervals. To avoid
548- over-estimate the confidence intervals in the case of very long sequence
549- length but not necessarily strict molecular clock, you
550- could use a smaller sequence length than the actual ones. Confidence
551- intervals are written in the nexus tree with label CI_height,
552- and can be visualzed with Figtree under Node bar feature.
553- -g outgroupFile
554- If your data contain outgroups, then specify the name of the outgroup
555- file here. The program will use the outgroups to root the trees.
556- If you use this combined with options -G, then the outgroups will be
557- removed. The format of this file should be:
558- n
559- OUTGROUP1
560- OUTGROUP2
561- ...
562- OUTGROUPn
563- -F
564- By default without this option, we impose the constraints that the date
565- of every node is equal or smaller then the
566- dates of its descendants, so the running time is quasi-linear. Using this
567- option we ignore this temporal constraints, and
568- the the running time becomes linear, much faster.
569- -h help
570- Print this message.
571- -i inputTreesFile
572- The name of the input trees file. It contains tree(s) in newick format,
573- each tree on one line. Note that the taxa sets of all
574- trees must be the same.
575- -j
576- Verbose mode for output messages.
577- -G
578- Use this option to remove the outgroups (given in option -g) in the
579- estimated tree. If this option is not used, the outgroups
580- will be kept and the root position in estimated on the branch defined by
581- the outgroups.
582- -l nullBlen
583- A branch in the input tree is considered informative if its length is
584- greater this value. By default it is 0.5/seq_length. Only
585- informative branches are forced to be bigger than a minimum branch length
586- (see option -u for more information about this).
587- -m samplingNumberOutlier
588- The number of dated nodes to be sampled when detecting outlier nodes.
589- This should be smaller than the number of dated nodes,
590- and is 10 by default.
591- -n datasetNumber
592- The number of trees that you want to read and analyse.
593- -o outputFile
594- The base name of the output files to write the results and the time-scale
595- trees.
596- -p partitionFile
597- The file that defines the partition of branches into multiple subsets in
598- the case that you know each subset has a different rate.
599- In the partition file, each line contains the name of the group, the
600- prior proportion of the group rate compared to the main rate
601- (selecting an appropriate value for this helps to converge faster), and a
602- list of subtrees whose branches are supposed to have the
603- same substitution rate. All branches that are not assigned to any subtree
604- form a group having another rate.
605- A subtree is defined between {}: its first node corresponds to the root
606- of the subtree, and the following nodes (if there any)
607- correspond to the tips of the subtree. If the first node is a tip label
608- then it takes the mrca of all tips as the root of the subtree.
609- If the tips of the subtree are not defined (so there's only the defined
610- root), then by
611- default this subtree is extended down to the tips of the full tree. For
612- example the input tree is
613- ((A:0.12,D:0.12)n1:0.3,((B:0.3,C:0.5)n2:0.4,(E:0.5,(F:0.2,G:0.3)n3:0.33)
614- n4:0.22)n5:0.2)root;
615- and you have the following partition file:
616- group1 1 {n1} {n5 n4}
617- group2 1 {n3}
618- then there are 3 rates: the first one includes the branches (n1,A),
619- (n1,D), (n5,n4), (n5,n2), (n2,B), (n2,C); the second one
620- includes the branches (n3,F), (n3,G), and the last one includes all the
621- remaining branches. If the internal nodes don't have labels,
622- then they can be defined by mrca of at least two tips, for example n1 is
623- mrca(A,D)
624- -q standardDeviationRelaxedClock
625- This value is involved in calculating confidence intervals to simulate a
626- lognormal relaxed clock. We multiply the simulated branch lengths
627- with a lognormal distribution with mean 1, and standard deviation q. By
628- default q is 0.2. The bigger q is, the more your tree is relaxed
629- and give you bigger confidence intervals.
630- -r rootingMethod
631- This option is used to specify the rooting method to estimate the
632- position of the root for unrooted trees, or
633- re-estimate the root for rooted trees. The principle is to search for the
634- position of the root that minimizes
635- the objective function.
636- Use -r l if your tree is rooted, and you want to re-estimate the root
637- locally around the given root.
638- Use -r a if you want to estimate the root on all branches (ignoring the
639- given root if the tree is rooted).
640- In this case, if the constrained mode is chosen (option -c), method
641- "a" first estimates the root without using the constraints.
642- After that, it uses the constrained mode to improve locally the
643- position of the root around this pre-estimated root.
644- Use -r as if you want to estimate to root using constrained mode on all
645- branches.
646- Use -r k if you want to re-estimate the root position on the same branche
647- of the given root.
648- If combined with option -g, the root will be estimated on the branche
649- defined by the outgroups.
650- -R round_time
651- This value is used to round the minimum branch length of the time scaled
652- tree. The purpose of this is to make the minimum branch length
653- a meaningful time unit, such as day, week, year ... By default this value
654- is 365, so if the input dates are year, the minimum branch
655- length is rounded to day. The rounding formula is round(R*minblen)/R.
656- -s sequenceLength
657- This option is used to specify the sequence length when estimating
658- confidence intervals (option -f). It is used to generate
659- integer branch lengths (number of substitutions) by multiplying this with
660- the estimated branch lengths. By default it is 1000.
661- -S minSupport
662- Together with collapsing internal short branches (see option -l), users
663- can also collapse internal branches having weak support values (if
664- provided in the input tree) by using this option. The program will
665- collapse all internal branches having support <= the specifed value.
666- -t rateLowerBound
667- This option corresponds to the lower bound for the estimating rate. It is
668- 1e-10 by default.
669- -u minBlen
670- By default without this option, lsd2 forces every branch of the time
671- scaled tree to be greater than 1/(seq_length*rate) where rate is
672- an pre-estimated median rate. This value is rounded to the number of days
673- or weeks or years, depending on the rounding parameter -R.
674- By using option -u, the program will not estimate the minimum branch
675- length but use the specified value instead.
676- -U minExBlen
677- Similar to option -u but applies for external branches if specified. If
678- it's not specified then the minimum branch length of external
679- branches is set the same as the one of internal branch.
680- -v variance
681- Use this option to specify the way you want to apply variances for the
682- branch lengths. Variances are used to recompense big errors on
683- long estimated branch lengths. The variance of the branch Bi is Vi =
684- (Bi+b) where b is specified by option -b.
685- If variance=0, then we don't use variance. If variance=1, then LSD uses
686- the input branch lengths to calculate variances.
687- If variance=2, then LSD runs twice where the second time it calculates
688- the variances based on the estimated branch
689- lengths of the first run. By default variance=1.
690- -V
691- Get the actual version.
692- -w givenRte
693- This option is used to specify the name of the file containing the
694- substitution rates.
695- In this case, the program will use the given rates to estimate the dates
696- of the nodes.
697- This file should have the following format
698- RATE1
699- RATE2
700- ...
701- where RATEi is the rate of the tree i in the inputTreesFile.
702- -z tipsDate
703- To specify the tips date if they are all equal. If the tips date is not a
704- number, but a string (ex: 2020-01-10, or b(2019,2020))
705- then it should be put between the quotes.
706- ```
0 commit comments