You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/Frequently-Asked-Questions.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,8 +85,7 @@ How does IQ-TREE treat gap/missing/ambiguous characters?
85
85
<divclass="hline"></div>
86
86
87
87
Gaps (`-`) and missing characters (`?` or `N` for DNA alignments) are treated in the same way as `unknown` characters, which represent no information. The same treatment holds for many other ML software (e.g., RAxML, PhyML). More explicitly,
88
-
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood
89
-
of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
88
+
for a site (column) of an alignment containing `AC-AG-A` (i.e. A for sequence 1, C for sequence 2, `-` for sequence 3, and so on), the site-likelihood of a tree T is equal to the site-likelihood of the subtree of T restricted to those sequences containing non-gap characters (`ACAGA`).
90
89
91
90
Ambiguous characters that represent more than one character are also supported: each represented character will have equal likelihood. For DNA the following ambigous nucleotides are supported according to [IUPAC nomenclature](https://en.wikipedia.org/wiki/Nucleic_acid_notation):
92
91
@@ -102,17 +101,20 @@ Ambiguous characters that represent more than one character are also supported:
102
101
| H | A, C or T (next letter after G) |
103
102
| D | A, G or T (next letter after C) |
104
103
| V | A, G or C (next letter after T) |
105
-
| ?, -, ., ~, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
104
+
| ?, -, ., ~, !, O, N, X | A, G, C or T (unknown; all 4 nucleotides are equally likely) |
106
105
107
-
For protein the following ambiguous amino-acids are supported:
106
+
For protein sequences the following ambiguous amino-acids are supported:
| ?, -, ., ~, * or X | unknown AA (all 20 AAs are equally likely) |
114
+
| ?, -, ., ~, *, ! or X | unknown AA (all 20 AAs are equally likely) |
115
+
116
+
The letters `*` and `!` may found in alignments of protein and/or coding DNA sequences.
117
+
Stop codon is typically translated to `*`. Some alignment programs also mark frameshift mutations (cf. [Ranwez et al., 2011]), that means since frameshift mutations in a codon alignment cause incomplete codons that cannot be unambiguously translated the resulting position in the translated protein sequence and padding positions in the respective codon are marked using `!`.
116
118
117
119
118
120
Can I mix DNA and protein data in a partitioned analysis?
0 commit comments