1. What were you trying to do?
I was trying to compute the snarls on a little graph by running both vg snarls and vg deconstruct -a
2. What did you want to happen?
I thought vg snarls and vg deconstruct -a would output the same snarls
3. What actually happened?
I had two different output, two snarls are "missing" in vg deconstruct output
Output of vg snarls :
{"end": {"node_id": "16"}, "end_self_reachable": true, "start": {"node_id": "1"}, "start_end_reachable": true, "start_self_reachable": true}
{"directed_acyclic_net_graph": true, "end": {"node_id": "15"}, "end_self_reachable": true, "parent": {"end": {"node_id": "16"}, "start": {"node_id": "1"}}, "start": {"node_id": "11"}, "start_end_reachable": true}
{"directed_acyclic_net_graph": true, "end": {"node_id": "8"}, "end_self_reachable": true, "parent": {"end": {"node_id": "16"}, "start": {"node_id": "1"}}, "start": {"node_id": "2"}, "start_end_reachable": true}
Ouput of vg deconstruct :
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT inv
ref 100 >1>16 CTTGACTTAGGCCAATACCTTTTTGTGTCTTGACCCCTGCAAATACGTTAGTGGTTCGTGACCGACTTCAGGGTCCCTGACCT CTCCCAAAGGTATTGGCCTAAGTCAACAAAACACAAACTAACGTATTTGCAGGGGTGTGACGAACCAGGTCAGGGACCCTGAAGTCGT 60 . AC=1;AF=1;AN=1;AT=>1>2>3>4>8>10>9>11>13>14>15>16,>1>2<5<6<4<3<2<7<9<10<12<15<14<13>15>16;NS=1;LV=0;RC=ref;RS=100;RD=183 GT 1
Also got "[vg deconstruct] Using flat processing" in stderr
I had this case for several graph, I would like to understand why if there is a reason.
5. What data and command can the vg dev team use to make the problem happen?
Here is the gfa graph I used to produce this output :
H VN:Z:1.0
S 1 CTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAAC
S 2 T
S 3 TGACTTAGGC
S 4 CAATACCTTT
S 5 GG
S 6 G
S 7 TTGTGTTTTG
S 8 TTGTGTCTTG
S 9 ATACGTTAGT
S 10 ACCCCTGCAA
S 11 GGTTCGTGAC
S 12 GGTTCGTCAC
S 13 CGACTTCAGG
S 14 GTCCCTGACC
S 15 T
S 16 CGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCG
L 2 + 3 + 0M
L 9 - 10 - 0M
L 12 - 15 - 0M
L 13 + 14 + 0M
L 10 + 9 + 0M
L 4 - 3 - 0M
L 11 + 13 + 0M
L 8 + 10 + 0M
L 3 - 2 - 0M
L 4 + 8 + 0M
L 9 + 11 + 0M
L 15 - 14 - 0M
L 3 + 4 + 0M
L 13 - 15 + 0M
L 5 - 6 - 0M
L 6 - 4 - 0M
L 14 + 15 + 0M
L 10 - 12 - 0M
L 2 - 7 - 0M
L 1 + 2 + 0M
L 14 - 13 - 0M
L 7 - 9 - 0M
L 15 + 16 + 0M
L 2 + 5 - 0M
P ref 1+,2+,3+,4+,8+,10+,9+,11+,13+,14+,15+,16+ ,,,,,,,,,,,
P inv 1+,2+,5-,6-,4-,3-,2-,7-,9-,10-,12-,15-,14-,13-,15+,16+ ,,,,,,,,,,,,,,,
The command I used are :
vg convert -f -W graph.gfa > graph.vg
vg snarls graph.vg > snarl_ouput.snarls
vg view -R snarl_output.snarls > snarls.json
vg deconstruct -a -p ref graph.vg > snarls.vcf (also tried with -r vg_snarls_output.snarls)
6. What does running vg version say?
vg version v1.73.0 "Ducky"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by anovak@courtyard.gi.ucsc.edu
1. What were you trying to do?
I was trying to compute the snarls on a little graph by running both vg snarls and vg deconstruct -a
2. What did you want to happen?
I thought vg snarls and vg deconstruct -a would output the same snarls
3. What actually happened?
I had two different output, two snarls are "missing" in vg deconstruct output
Output of vg snarls :
{"end": {"node_id": "16"}, "end_self_reachable": true, "start": {"node_id": "1"}, "start_end_reachable": true, "start_self_reachable": true}
{"directed_acyclic_net_graph": true, "end": {"node_id": "15"}, "end_self_reachable": true, "parent": {"end": {"node_id": "16"}, "start": {"node_id": "1"}}, "start": {"node_id": "11"}, "start_end_reachable": true}
{"directed_acyclic_net_graph": true, "end": {"node_id": "8"}, "end_self_reachable": true, "parent": {"end": {"node_id": "16"}, "start": {"node_id": "1"}}, "start": {"node_id": "2"}, "start_end_reachable": true}
Ouput of vg deconstruct :
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT inv
ref 100 >1>16 CTTGACTTAGGCCAATACCTTTTTGTGTCTTGACCCCTGCAAATACGTTAGTGGTTCGTGACCGACTTCAGGGTCCCTGACCT CTCCCAAAGGTATTGGCCTAAGTCAACAAAACACAAACTAACGTATTTGCAGGGGTGTGACGAACCAGGTCAGGGACCCTGAAGTCGT 60 . AC=1;AF=1;AN=1;AT=>1>2>3>4>8>10>9>11>13>14>15>16,>1>2<5<6<4<3<2<7<9<10<12<15<14<13>15>16;NS=1;LV=0;RC=ref;RS=100;RD=183 GT 1
Also got "[vg deconstruct] Using flat processing" in stderr
I had this case for several graph, I would like to understand why if there is a reason.
5. What data and command can the vg dev team use to make the problem happen?
Here is the gfa graph I used to produce this output :
H VN:Z:1.0
S 1 CTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAACCTGGGTAAAC
S 2 T
S 3 TGACTTAGGC
S 4 CAATACCTTT
S 5 GG
S 6 G
S 7 TTGTGTTTTG
S 8 TTGTGTCTTG
S 9 ATACGTTAGT
S 10 ACCCCTGCAA
S 11 GGTTCGTGAC
S 12 GGTTCGTCAC
S 13 CGACTTCAGG
S 14 GTCCCTGACC
S 15 T
S 16 CGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCGCGAGTAGGCG
L 2 + 3 + 0M
L 9 - 10 - 0M
L 12 - 15 - 0M
L 13 + 14 + 0M
L 10 + 9 + 0M
L 4 - 3 - 0M
L 11 + 13 + 0M
L 8 + 10 + 0M
L 3 - 2 - 0M
L 4 + 8 + 0M
L 9 + 11 + 0M
L 15 - 14 - 0M
L 3 + 4 + 0M
L 13 - 15 + 0M
L 5 - 6 - 0M
L 6 - 4 - 0M
L 14 + 15 + 0M
L 10 - 12 - 0M
L 2 - 7 - 0M
L 1 + 2 + 0M
L 14 - 13 - 0M
L 7 - 9 - 0M
L 15 + 16 + 0M
L 2 + 5 - 0M
P ref 1+,2+,3+,4+,8+,10+,9+,11+,13+,14+,15+,16+ ,,,,,,,,,,,
P inv 1+,2+,5-,6-,4-,3-,2-,7-,9-,10-,12-,15-,14-,13-,15+,16+ ,,,,,,,,,,,,,,,
The command I used are :
vg convert -f -W graph.gfa > graph.vg
vg snarls graph.vg > snarl_ouput.snarls
vg view -R snarl_output.snarls > snarls.json
vg deconstruct -a -p ref graph.vg > snarls.vcf (also tried with -r vg_snarls_output.snarls)
6. What does running
vg versionsay?vg version v1.73.0 "Ducky"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by anovak@courtyard.gi.ucsc.edu