Skip to content

Commit 80ec245

Browse files
committed
update link to concordance recipes
1 parent 9ab1211 commit 80ec245

3 files changed

Lines changed: 133 additions & 108 deletions

File tree

doc/Simulating-concatenated-alignments-with-multiple-trees.md

Lines changed: 0 additions & 107 deletions
This file was deleted.
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Intro
2+
3+
This recipe explains how to simulate an alignment which comes from multiple
4+
trees.
5+
6+
For this example, we'll start by simulate a 5000bp alignment where the first
7+
3500bp comes from one tree, and the remaining 1500bp comes from another tree.
8+
Then we'll do something with a *lot* more trees.
9+
10+
Hopefully the combination of these two recipes will give you a good idea of how
11+
to do many other simulations...
12+
13+
# What you need
14+
15+
* IQ-TREE 2.2 or above to simulate alignments using AliSim: http://www.iqtree.org/
16+
* AMAS: https://github.com/marekborowiec/AMAS
17+
18+
I use conda to install both of these, and suggest you do to. If you want to do
19+
that, here's one way to do it:
20+
21+
```bash
22+
# set up a fresh environment and activate it
23+
conda create --name simaln conda activate simaln
24+
25+
# install what we need for this recipe
26+
conda install -c bioconda iqtree amas ```
27+
28+
# Simulating a two-tree alignment
29+
30+
First let's simulate our two separate alignments. The commandlines below each
31+
simulate an alignment using a randomly generated 20 taxon Yule tree and a
32+
GTR+I+R8 model with random parameters. You can change all of this of course -
33+
for that just head over to the AliSim page and look at the various simulation
34+
options. Here the options we specify are:
35+
36+
* `--alism [NAME]` to specify that we want to simulate an alignment and call it
37+
`NAME`
38+
* `-t RANDOM{yh/20}` to specify that we want a randomly-generated Yule
39+
tree of 20 taxa
40+
* `-m GTR+I+R8` to specify the model of sequence evolution
41+
* `--length [NUMBER]` to specify the length of each alignment
42+
43+
```
44+
iqtree2 --alisim part1 -t RANDOM{yh/20} -m GTR+I+R8 --length 9000
45+
iqtree2 --alisim part2 -t RANDOM{yh/20} -m GTR+I+R8 --length 1000
46+
```
47+
48+
Now you'll see that you have `part1.phy` and `part2.phy`. Next we concatenate
49+
these two alignments with AMAS, like this:
50+
51+
```
52+
AMAS.py concat -i part1.phy part2.phy -f phylip -u phylip -t part1_part2_concat.phy -d dna
53+
```
54+
55+
With AMAS, we just specify:
56+
57+
* `-i [INPUT FILES]` the input files in the order we want to concatenate them
58+
* `-f [INPUT FORMAT]` input format of the alignments (phylip for us)
59+
* `-u [OUTPUT FORMAT]` output format of the concatenated alignment
60+
* `-t [OUTPUT FILENAME]` name of the output alignment
61+
* `-d [TYPE]` whether the input is `dna` or `aa`
62+
63+
That's it! Your final alignmnet is in the file `part1_part2_concat.phy`
64+
65+
# Generalising to any number of trees
66+
67+
With a few tricks from bash, we can easily generalise the above into a method of
68+
creating concatenated alignments with any number of trees and any combination of
69+
lengths. Of course there are *lots* of ways to do this, but here's one.
70+
71+
Here's the script, with comments to help decipher it. I'm not all that flash
72+
with bash, so there are likely simpler ways to do some of this. But it will get
73+
the job done.
74+
75+
```bash
76+
#!/bin/bash
77+
78+
# set the lengths as a list
79+
lengths=( 1000 1000 2000 5 3000 )
80+
81+
# get the number of alignments you want from the length of that list
82+
N=$(wc -w <<< "${lengths[*]}")
83+
84+
# use a for loop to build the alignments one by one with IQ-TREE we'll also
85+
# collect up the files to concatenate in this empty list
86+
to_concat=( )
87+
88+
for (( i=0; i<$N; i++ )); do len=${lengths[$i]}
89+
90+
# print to show what we're doing
91+
echo "index: $i, length: $len" iqtree2 --alisim $i -t RANDOM{yh/20} -m
92+
GTR+I+R8 --length $len
93+
94+
# add the filename to the list of things to concatenate
95+
to_concat+=( $i".phy" )
96+
97+
done
98+
99+
# check the list of files we're going to concatenate
100+
echo ${to_concat[*]}
101+
102+
# concatenate them all with AMAS
103+
AMAS.py concat -i ${to_concat[*]} -f phylip -u phylip -t $N"_concat.phy" -d dna
104+
```
105+
106+
# A few thoughts on that approach.
107+
108+
First, it should work fine for reasonably sized sets of alignments (e.g. up to a
109+
thousand or so). But if you're looking to do millions then you'll probably want
110+
to do a few things much smarter, like parallelise the simulation steps, and
111+
maybe think about better ways to concatenate them (I have no idea how well AMAS
112+
scales to really huge jobs).
113+
114+
Second, there's no guarantee that the trees will differ - these are random trees
115+
after all. So if you really care then it would be a good idea to double check
116+
that first, e.g. by simulating a set of trees that meet your requirments before
117+
running IQ-TREE at all (you can pass individual trees to AliSim no problem).
118+
119+
Third, if AliSim fails to simulate an alignment for whatever reason, this script
120+
will fall over, because AMAS will be trying to concatenate an alignment that
121+
doesn't exist. So if you want to use this for research you should certainly add
122+
in some basic error checking, e.g. that the length of the final alignment
123+
matches the sum of the lengths of the inputs, that each file exists before you
124+
try to concatenate it, etc.
125+
126+
Finally, it may be useful to make the script take some parameters as input -
127+
like the list of lengths, or the model, number of taxa, or whatever. All this is
128+
pretty simple, and you can find out how with google and Stack Overflow!
129+
130+
Still, I hope this is helpful and gives you some ways of getting started.
131+
132+

doc/recipes/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@ If you'd like to request a recipe, please head over to the issues page and use t
2727

2828
* [How to do phylogenetic placement](phylogenetic-placement)
2929
* [How to perform a Goldman-Cox test](goldman-cox-test)
30-
* [Estimating gene, site, and quartet concordance vectors](concordance-vector)
30+
* [Estimating gene, site, and quartet concordance vectors](concordance-vector)

0 commit comments

Comments
 (0)