|
| 1 | +# Intro |
| 2 | + |
| 3 | +This recipe explains how to simulate an alignment which comes from multiple |
| 4 | +trees. |
| 5 | + |
| 6 | +For this example, we'll start by simulate a 5000bp alignment where the first |
| 7 | +3500bp comes from one tree, and the remaining 1500bp comes from another tree. |
| 8 | +Then we'll do something with a *lot* more trees. |
| 9 | + |
| 10 | +Hopefully the combination of these two recipes will give you a good idea of how |
| 11 | +to do many other simulations... |
| 12 | + |
| 13 | +# What you need |
| 14 | + |
| 15 | +* IQ-TREE 2.2 or above to simulate alignments using AliSim: http://www.iqtree.org/ |
| 16 | +* AMAS: https://github.com/marekborowiec/AMAS |
| 17 | + |
| 18 | +I use conda to install both of these, and suggest you do to. If you want to do |
| 19 | +that, here's one way to do it: |
| 20 | + |
| 21 | +```bash |
| 22 | +# set up a fresh environment and activate it |
| 23 | +conda create --name simaln conda activate simaln |
| 24 | + |
| 25 | +# install what we need for this recipe |
| 26 | +conda install -c bioconda iqtree amas ``` |
| 27 | +
|
| 28 | +# Simulating a two-tree alignment |
| 29 | +
|
| 30 | +First let's simulate our two separate alignments. The commandlines below each |
| 31 | +simulate an alignment using a randomly generated 20 taxon Yule tree and a |
| 32 | +GTR+I+R8 model with random parameters. You can change all of this of course - |
| 33 | +for that just head over to the AliSim page and look at the various simulation |
| 34 | +options. Here the options we specify are: |
| 35 | +
|
| 36 | +* `--alism [NAME]` to specify that we want to simulate an alignment and call it |
| 37 | +`NAME` |
| 38 | +* `-t RANDOM{yh/20}` to specify that we want a randomly-generated Yule |
| 39 | +tree of 20 taxa |
| 40 | +* `-m GTR+I+R8` to specify the model of sequence evolution |
| 41 | +* `--length [NUMBER]` to specify the length of each alignment |
| 42 | +
|
| 43 | +``` |
| 44 | +iqtree2 --alisim part1 -t RANDOM{yh/20} -m GTR+I+R8 --length 9000 |
| 45 | +iqtree2 --alisim part2 -t RANDOM{yh/20} -m GTR+I+R8 --length 1000 |
| 46 | +``` |
| 47 | +
|
| 48 | +Now you'll see that you have `part1.phy` and `part2.phy`. Next we concatenate |
| 49 | +these two alignments with AMAS, like this: |
| 50 | +
|
| 51 | +``` |
| 52 | +AMAS.py concat -i part1.phy part2.phy -f phylip -u phylip -t part1_part2_concat.phy -d dna |
| 53 | +``` |
| 54 | + |
| 55 | +With AMAS, we just specify: |
| 56 | + |
| 57 | +* `-i [INPUT FILES]` the input files in the order we want to concatenate them |
| 58 | +* `-f [INPUT FORMAT]` input format of the alignments (phylip for us) |
| 59 | +* `-u [OUTPUT FORMAT]` output format of the concatenated alignment |
| 60 | +* `-t [OUTPUT FILENAME]` name of the output alignment |
| 61 | +* `-d [TYPE]` whether the input is `dna` or `aa` |
| 62 | + |
| 63 | +That's it! Your final alignmnet is in the file `part1_part2_concat.phy` |
| 64 | + |
| 65 | +# Generalising to any number of trees |
| 66 | + |
| 67 | +With a few tricks from bash, we can easily generalise the above into a method of |
| 68 | +creating concatenated alignments with any number of trees and any combination of |
| 69 | +lengths. Of course there are *lots* of ways to do this, but here's one. |
| 70 | + |
| 71 | +Here's the script, with comments to help decipher it. I'm not all that flash |
| 72 | +with bash, so there are likely simpler ways to do some of this. But it will get |
| 73 | +the job done. |
| 74 | + |
| 75 | +```bash |
| 76 | +#!/bin/bash |
| 77 | + |
| 78 | +# set the lengths as a list |
| 79 | +lengths=( 1000 1000 2000 5 3000 ) |
| 80 | + |
| 81 | +# get the number of alignments you want from the length of that list |
| 82 | +N=$(wc -w <<< "${lengths[*]}") |
| 83 | + |
| 84 | +# use a for loop to build the alignments one by one with IQ-TREE we'll also |
| 85 | +# collect up the files to concatenate in this empty list |
| 86 | +to_concat=( ) |
| 87 | + |
| 88 | +for (( i=0; i<$N; i++ )); do len=${lengths[$i]} |
| 89 | + |
| 90 | + # print to show what we're doing |
| 91 | + echo "index: $i, length: $len" iqtree2 --alisim $i -t RANDOM{yh/20} -m |
| 92 | + GTR+I+R8 --length $len |
| 93 | + |
| 94 | + # add the filename to the list of things to concatenate |
| 95 | + to_concat+=( $i".phy" ) |
| 96 | + |
| 97 | +done |
| 98 | + |
| 99 | +# check the list of files we're going to concatenate |
| 100 | +echo ${to_concat[*]} |
| 101 | + |
| 102 | +# concatenate them all with AMAS |
| 103 | +AMAS.py concat -i ${to_concat[*]} -f phylip -u phylip -t $N"_concat.phy" -d dna |
| 104 | +``` |
| 105 | + |
| 106 | +# A few thoughts on that approach. |
| 107 | + |
| 108 | +First, it should work fine for reasonably sized sets of alignments (e.g. up to a |
| 109 | +thousand or so). But if you're looking to do millions then you'll probably want |
| 110 | +to do a few things much smarter, like parallelise the simulation steps, and |
| 111 | +maybe think about better ways to concatenate them (I have no idea how well AMAS |
| 112 | +scales to really huge jobs). |
| 113 | + |
| 114 | +Second, there's no guarantee that the trees will differ - these are random trees |
| 115 | +after all. So if you really care then it would be a good idea to double check |
| 116 | +that first, e.g. by simulating a set of trees that meet your requirments before |
| 117 | +running IQ-TREE at all (you can pass individual trees to AliSim no problem). |
| 118 | + |
| 119 | +Third, if AliSim fails to simulate an alignment for whatever reason, this script |
| 120 | +will fall over, because AMAS will be trying to concatenate an alignment that |
| 121 | +doesn't exist. So if you want to use this for research you should certainly add |
| 122 | +in some basic error checking, e.g. that the length of the final alignment |
| 123 | +matches the sum of the lengths of the inputs, that each file exists before you |
| 124 | +try to concatenate it, etc. |
| 125 | + |
| 126 | +Finally, it may be useful to make the script take some parameters as input - |
| 127 | +like the list of lengths, or the model, number of taxa, or whatever. All this is |
| 128 | +pretty simple, and you can find out how with google and Stack Overflow! |
| 129 | + |
| 130 | +Still, I hope this is helpful and gives you some ways of getting started. |
| 131 | + |
| 132 | + |
0 commit comments