Skip to content

Commit 3ed2e96

Browse files
committed
move LSD2 full commands to a separate file
1 parent a07889f commit 3ed2e96

2 files changed

Lines changed: 246 additions & 232 deletions

File tree

doc/Dating.md

Lines changed: 2 additions & 232 deletions
Original file line numberDiff line numberDiff line change
@@ -452,236 +452,6 @@ to control the way that LSD2 treats outliers, you can do this:
452452
iqtree -s ALN_FILE --date DATE_FILE --date-options "-e 2"
453453

454454
A full list of the options for LSD2 can be obtained by downloading LSD2 and
455-
running `lsd2 -h`, the output of that command is reproduced here for
456-
convenience:
455+
running `lsd2 -h`, the output of that command is [provided here](lsd2-help) for
456+
your convenience.
457457

458-
```
459-
LSD: LEAST-SQUARES METHODS TO ESTIMATE RATES AND DATES - v.1.8
460-
461-
DESCRIPTION
462-
This program estimates the rate and the dates of the input phylogenies given
463-
some temporal constraints.
464-
It minimizes the square errors of the branch lengths under normal
465-
distribution model.
466-
467-
SYNOPSIS
468-
./lsd [-i inputFile] [-d inputDateFile] [-o outputFile] [-s sequenceLength]
469-
[-g outgroupFile] [-f nbSamplings]
470-
OPTIONS
471-
-a rootDate
472-
To specify the root date if there's any. If the root date is not a
473-
number, but a string (ex: 2020-01-10, or b(2019,2020)) then it should
474-
be put between the quotes.
475-
-b varianceParameter
476-
The parameter (between 0 and 1) to compute the variances in option -v. It
477-
is the pseudo positive constant to add to the branch lengths
478-
when calculating variances, to adjust the dependency of variances to
479-
branch lengths. By default b is the maximum between median branch length
480-
and 10/seqlength; but it should be adjusted based on how/whether the
481-
input tree is relaxed or strict. The smaller it is the more variances
482-
would be linear to branch lengths, which is relevant for strict clock.
483-
The bigger it is the less effect of branch lengths on variances,
484-
which might be better for relaxed clock.
485-
-d inputDateFile
486-
This options is used to read the name of the input date file which
487-
contains temporal constraints of internal nodes
488-
or tips. An internal node can be defined either by its label (given in
489-
the input tree) or by a subset of tips that have it as
490-
the most recent common ancestor (mrca). A date could be a real or a
491-
string or format year-month-day.
492-
The first line of this file is the number of temporal constraints. A
493-
temporal constraint can be fixed date, or a
494-
lower bound l(value), or an upper bound u(value), or an interval b(v1,v2)
495-
For example, if the input tree has 4 taxa a,b,c,d, and an internal node
496-
named n, then following is a possible date file:
497-
6
498-
a l(2003.12)
499-
b u(2007.07)
500-
c 2005
501-
d b(2001.2,2007.11)
502-
mrca(a,b,c,d) b(2000,2001)
503-
n l(2004.3)
504-
If this option is omitted, and option -a, -z are also omitted, the
505-
program will estimate relative dates by giving T[root]=0 and T[tips]=1.
506-
-D outDateFormat
507-
Specify output date format: 1 for real, 2 for year-month-day. By default
508-
the program will guess the format of input dates and uses it for
509-
output dates.
510-
-e ZscoreOutlier
511-
This option is used to estimate and exclude outlier nodes before dating
512-
process.
513-
LSD2 normalize the branch residus and decide a node is outlier if its
514-
related residus is great than the ZscoreOutlier.
515-
A normal value of ZscoreOutliercould be 3, but you can adjust it
516-
bigger/smaller depending if you want to have
517-
less/more outliers. Note that for now, some functionalities could not be
518-
combined with outliers estimation, for example
519-
estimating multiple rates, imprecise date constraints.
520-
-f samplingNumberCI
521-
This option calculates the confidence intervals of the estimated rate and
522-
dates. The branch lengths of the esimated
523-
tree are sampled samplingNumberCI times to generate a set of simulated
524-
trees. To generate simulated lengths
525-
for each branch, we use a Poisson distribution whose mean equals to the
526-
estimated one multiplied by the sequence length, which is
527-
1000 by default if nothing was specified via option -s. Long sequence
528-
length tends to give small confidence intervals. To avoid
529-
over-estimate the confidence intervals in the case of very long sequence
530-
length but not necessarily strict molecular clock, you
531-
could use a smaller sequence length than the actual ones. Confidence
532-
intervals are written in the nexus tree with label CI_height,
533-
and can be visualzed with Figtree under Node bar feature.
534-
-g outgroupFile
535-
If your data contain outgroups, then specify the name of the outgroup
536-
file here. The program will use the outgroups to root the trees.
537-
If you use this combined with options -G, then the outgroups will be
538-
removed. The format of this file should be:
539-
n
540-
OUTGROUP1
541-
OUTGROUP2
542-
...
543-
OUTGROUPn
544-
-F
545-
By default without this option, we impose the constraints that the date
546-
of every node is equal or smaller then the
547-
dates of its descendants, so the running time is quasi-linear. Using this
548-
option we ignore this temporal constraints, and
549-
the the running time becomes linear, much faster.
550-
-h help
551-
Print this message.
552-
-i inputTreesFile
553-
The name of the input trees file. It contains tree(s) in newick format,
554-
each tree on one line. Note that the taxa sets of all
555-
trees must be the same.
556-
-j
557-
Verbose mode for output messages.
558-
-G
559-
Use this option to remove the outgroups (given in option -g) in the
560-
estimated tree. If this option is not used, the outgroups
561-
will be kept and the root position in estimated on the branch defined by
562-
the outgroups.
563-
-l nullBlen
564-
A branch in the input tree is considered informative if its length is
565-
greater this value. By default it is 0.5/seq_length. Only
566-
informative branches are forced to be bigger than a minimum branch length
567-
(see option -u for more information about this).
568-
-m samplingNumberOutlier
569-
The number of dated nodes to be sampled when detecting outlier nodes.
570-
This should be smaller than the number of dated nodes,
571-
and is 10 by default.
572-
-n datasetNumber
573-
The number of trees that you want to read and analyse.
574-
-o outputFile
575-
The base name of the output files to write the results and the time-scale
576-
trees.
577-
-p partitionFile
578-
The file that defines the partition of branches into multiple subsets in
579-
the case that you know each subset has a different rate.
580-
In the partition file, each line contains the name of the group, the
581-
prior proportion of the group rate compared to the main rate
582-
(selecting an appropriate value for this helps to converge faster), and a
583-
list of subtrees whose branches are supposed to have the
584-
same substitution rate. All branches that are not assigned to any subtree
585-
form a group having another rate.
586-
A subtree is defined between {}: its first node corresponds to the root
587-
of the subtree, and the following nodes (if there any)
588-
correspond to the tips of the subtree. If the first node is a tip label
589-
then it takes the mrca of all tips as the root of the subtree.
590-
If the tips of the subtree are not defined (so there's only the defined
591-
root), then by
592-
default this subtree is extended down to the tips of the full tree. For
593-
example the input tree is
594-
((A:0.12,D:0.12)n1:0.3,((B:0.3,C:0.5)n2:0.4,(E:0.5,(F:0.2,G:0.3)n3:0.33)
595-
n4:0.22)n5:0.2)root;
596-
and you have the following partition file:
597-
group1 1 {n1} {n5 n4}
598-
group2 1 {n3}
599-
then there are 3 rates: the first one includes the branches (n1,A),
600-
(n1,D), (n5,n4), (n5,n2), (n2,B), (n2,C); the second one
601-
includes the branches (n3,F), (n3,G), and the last one includes all the
602-
remaining branches. If the internal nodes don't have labels,
603-
then they can be defined by mrca of at least two tips, for example n1 is
604-
mrca(A,D)
605-
-q standardDeviationRelaxedClock
606-
This value is involved in calculating confidence intervals to simulate a
607-
lognormal relaxed clock. We multiply the simulated branch lengths
608-
with a lognormal distribution with mean 1, and standard deviation q. By
609-
default q is 0.2. The bigger q is, the more your tree is relaxed
610-
and give you bigger confidence intervals.
611-
-r rootingMethod
612-
This option is used to specify the rooting method to estimate the
613-
position of the root for unrooted trees, or
614-
re-estimate the root for rooted trees. The principle is to search for the
615-
position of the root that minimizes
616-
the objective function.
617-
Use -r l if your tree is rooted, and you want to re-estimate the root
618-
locally around the given root.
619-
Use -r a if you want to estimate the root on all branches (ignoring the
620-
given root if the tree is rooted).
621-
In this case, if the constrained mode is chosen (option -c), method
622-
"a" first estimates the root without using the constraints.
623-
After that, it uses the constrained mode to improve locally the
624-
position of the root around this pre-estimated root.
625-
Use -r as if you want to estimate to root using constrained mode on all
626-
branches.
627-
Use -r k if you want to re-estimate the root position on the same branche
628-
of the given root.
629-
If combined with option -g, the root will be estimated on the branche
630-
defined by the outgroups.
631-
-R round_time
632-
This value is used to round the minimum branch length of the time scaled
633-
tree. The purpose of this is to make the minimum branch length
634-
a meaningful time unit, such as day, week, year ... By default this value
635-
is 365, so if the input dates are year, the minimum branch
636-
length is rounded to day. The rounding formula is round(R*minblen)/R.
637-
-s sequenceLength
638-
This option is used to specify the sequence length when estimating
639-
confidence intervals (option -f). It is used to generate
640-
integer branch lengths (number of substitutions) by multiplying this with
641-
the estimated branch lengths. By default it is 1000.
642-
-S minSupport
643-
Together with collapsing internal short branches (see option -l), users
644-
can also collapse internal branches having weak support values (if
645-
provided in the input tree) by using this option. The program will
646-
collapse all internal branches having support <= the specifed value.
647-
-t rateLowerBound
648-
This option corresponds to the lower bound for the estimating rate. It is
649-
1e-10 by default.
650-
-u minBlen
651-
By default without this option, lsd2 forces every branch of the time
652-
scaled tree to be greater than 1/(seq_length*rate) where rate is
653-
an pre-estimated median rate. This value is rounded to the number of days
654-
or weeks or years, depending on the rounding parameter -R.
655-
By using option -u, the program will not estimate the minimum branch
656-
length but use the specified value instead.
657-
-U minExBlen
658-
Similar to option -u but applies for external branches if specified. If
659-
it's not specified then the minimum branch length of external
660-
branches is set the same as the one of internal branch.
661-
-v variance
662-
Use this option to specify the way you want to apply variances for the
663-
branch lengths. Variances are used to recompense big errors on
664-
long estimated branch lengths. The variance of the branch Bi is Vi =
665-
(Bi+b) where b is specified by option -b.
666-
If variance=0, then we don't use variance. If variance=1, then LSD uses
667-
the input branch lengths to calculate variances.
668-
If variance=2, then LSD runs twice where the second time it calculates
669-
the variances based on the estimated branch
670-
lengths of the first run. By default variance=1.
671-
-V
672-
Get the actual version.
673-
-w givenRte
674-
This option is used to specify the name of the file containing the
675-
substitution rates.
676-
In this case, the program will use the given rates to estimate the dates
677-
of the nodes.
678-
This file should have the following format
679-
RATE1
680-
RATE2
681-
...
682-
where RATEi is the rate of the tree i in the inputTreesFile.
683-
-z tipsDate
684-
To specify the tips date if they are all equal. If the tips date is not a
685-
number, but a string (ex: 2020-01-10, or b(2019,2020))
686-
then it should be put between the quotes.
687-
```

0 commit comments

Comments
 (0)