Skip to content

Commit 029091a

Browse files
committed
Updated Frequently Asked Questions (markdown)
1 parent aa8df4f commit 029091a

1 file changed

Lines changed: 7 additions & 5 deletions

File tree

doc/Frequently-Asked-Questions.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,7 @@ How does IQ-TREE treat gap/missing/ambiguous characters?
8585
<div class="hline"></div>
8686

8787
Gaps (`-`) and missing characters (`?` or `N` for DNA alignments) are treated in the same way as `unknown` characters, which represent no information. The same treatment holds for many other ML software (e.g., RAxML, PhyML). More explicitly,
88-
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood
89-
of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
88+
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
9089

9190
Ambiguous characters that represent more than one character are also supported: each represented character will have equal likelihood. For DNA the following ambigous nucleotides are supported according to [IUPAC nomenclature](https://en.wikipedia.org/wiki/Nucleic_acid_notation):
9291

@@ -102,17 +101,20 @@ Ambiguous characters that represent more than one character are also supported:
102101
| H | A, C or T (next letter after G) |
103102
| D | A, G or T (next letter after C) |
104103
| V | A, G or C (next letter after T) |
105-
| ?, -, ., ~, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
104+
| ?, -, ., ~, !, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
106105

107-
For protein the following ambiguous amino-acids are supported:
106+
For protein sequences the following ambiguous amino-acids are supported:
108107

109108
| Amino-acid | Meaning |
110109
|------------|---------------------------------------------------------------|
111110
| B | N or D |
112111
| Z | Q or E |
113112
| J | I or L |
114113
| U | unknown AA (although it is the 21st AA) |
115-
| ?, -, ., ~, * or X | unknown AA (all 20 AAs are equally likely) |
114+
| ?, -, ., ~, *, ! or X | unknown AA (all 20 AAs are equally likely) |
115+
116+
The letters `*` and `!` may found in alignments of protein and/or coding DNA sequences.
117+
Stop codon is typically translated to `*`. Some alignment programs also mark frameshift mutations (cf. [Ranwez et al., 2011]), that means since frameshift mutations in a codon alignment cause incomplete codons that cannot be unambiguously translated the resulting position in the translated protein sequence and padding positions in the respective codon are marked using `!`.
116118

117119

118120
Can I mix DNA and protein data in a partitioned analysis?

0 commit comments

Comments
 (0)