You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But we can use the tree with branch IDs to put any label on a tree. An example is in the `change_labels.R` script. As written, this script just updates the branch ID labels in the `gcf.cf.branch` tree to show the ID and the three concordance factors (the Ψ<sub>1</sub> values), each labelled with the first letter of the input data (i.e. `g` for genes, `s` for sites, and `q` for quartets), like so. You can run this script like so:
267
+
But we can use the tree with branch IDs to put any label on a tree. An example is in the `change_labels.R` script. As written, this script just updates the branch ID labels in the `gcf.cf.branch` tree to show the ID and the three concordance factors (the Ψ<sub>1</sub> values), each labelled with the first letter of the input data (i.e. `g` for genes, `s` for sites, and `q` for quartets). You can run this script like so:
268
268
269
269
```
270
270
Rscript change_labels.R
@@ -274,68 +274,70 @@ This will output a nexus-formatted tree file called `id_gcf_scf_qcf.nex`. Each b
274
274
275
275
`391-g98.54-s84.09-q98.54`
276
276
277
-
The first number is the branch ID, and the next three are the three concordance factors. This can be useful for exploring your data. For example, in my analysis, this part of the tree has some interesting nodes:
277
+
The first number is the branch ID, and the next three are the three concordance factors. This can be useful for exploring your data. For example, if you look at the part of the species tree we inferred in this recipe that groups the kiwis (genus *Apteryx*), you can see that there is a lot of concordance in this part of the tree:
***Node 642**, which groups *Balaeniceps rex* (the shoebill) and *Scopus umbretta* (the hamerkop) has concordance factors very close to a third
282
-
***Node 641**, which adds *Pelecanus crispus* (the Dalmatian pelican) to the group, has much higher gene and quartet concordance factors, but a low site concordance factor
283
-
***Node 640**, which adds *Mesembrinibis cayennensis* (the green ibis) and *Nipponia nippon* (the crested ibis) also has low concordance factors (node 643, which groups the ibises, has very high concordance factors)
The concordance factors tell you a certain amount, but to understand things better, you really need to examine the concordance vectors.
286
282
287
283
> If you want to put different labels on your tree, that is relatively simple to do by editing the `change_labels.R` script, which you can get from GitHub here: [https://github.com/roblanf/concordance_vectors/blob/main/change_labels.R](https://github.com/roblanf/concordance_vectors/blob/main/change_labels.R)
288
284
289
285
# Generate concordance tables for branches of interest
290
286
291
-
A concordance table is just a table of the three concordance vectors, as shown in the Lanfear and Hahn paper. The `concordance_table.R` script lets you generate concordance tables for any node, based on the branch ID. Let's do that for node 642. The script takes two input files:
287
+
A concordance table is just a table of the three concordance vectors, as shown in the Lanfear and Hahn paper. The `concordance_table.R` script lets you generate a concordance table for any node, based on the branch ID. Here we'll do that for two branches that were recovered in the original Nature paper, discussed in Lanfear and Hahn, and also recovered in the ASTRAL tree we estimated here from 400 loci (I found the branch IDs for these branches by studying the tree labelled with branch IDs that I made above):
288
+
289
+
***Branch 598**: the Palaeognathae (kiwis and other cool birds)
290
+
***Branch 545**: the Telluraves (passerines and other closely related groups)
291
+
292
+
The `concordance_table.R` script takes two input variables:
292
293
293
294
* the `concordance_vectors.csv` file we generated above
294
-
* the branch ID, `642` in this case
295
+
* the branch ID
295
296
296
-
You can run it like this
297
+
So to get the tables for our two branches, we run it once for each as follows:
Clearly there is substantial discordance around this branch! Compare this to the palaeognathae, which have far less discordance:
328
333
329
-
You'll notice that both include 95% confidence intervals for the concordance and discordance factors. These are calculated using 1000 bootstraps of the count data, and provide useful context for interpreting the values, and particularly for interpreting potential *differences* in the values.
330
-
331
-
You'll notice that for node 642, there is an enormous amount of discordance. Indeed, the first three entries of the vector all have confidence intervals that overlap considerably for gene and quartet concordance factors. So although ASTRAL did what it is supposed to do and chose the node with the highest quartet concordance factor, it would be difficult to be extremely confident that this is the correct topology for the species tree. In support of that, examining the `concordance_vectors.csv` file shows that the posterior probability for this branch is just 0.5, which is extremely low.
### Confidence intervals on the concordance vectors
336
339
337
-
Almost all genes and quartets are concordance with this node. The sites have more discordance, and Ψ<sub>2</sub> seems a lot higher for sites than the other entries in the vector. This deserves further investigation, but *could* occur if a lot of the genes that support Ψ<sub>2</sub> are very informative, and/or if there's lot of homoplasy.
340
+
You'll notice that the tables include 95% confidence intervals for the concordance and discordance factors. These are calculated using 1000 bootstraps of the count data, and provide useful context for interpreting the values, and particularly for interpreting potential *differences* in the values.
338
341
339
-
# Conclusion
342
+
These bootstrap confidence intervals are calculated by resampling from the counts for each concordance vector. The total sample size for each category (genes, sites, and quartets) is shown on the table underneath the y axis label. Note that for sites and quartets, the counts are not always whole numbers because of how they are calculated. This can also mean that the bootstrap confidence intervals can be a little off for very low counts, because the numbers have to be rounded to integers in order to calculate them.
340
343
341
-
We hope this recipe provides some useful guidance on calculating concordance vectors for your data!
0 commit comments