Skip to content

locateVariants() and predictCoding() silently drop large INDELs #81

@johnstonmj

Description

@johnstonmj

I have encountered an issue with large INDELs being silently discarded from my results.

This is the offending VCF entry:
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTGCCGCACGAATCCCATGGTGGCCAGCTCGGTCGTCCCGGGGCCTCTACTGTCCAGAGTCCTCCGCGGATCCCGATCTGACGGTTCACTAAACGAGCTCTGTTTATATAGACCTCCCACCGTACACGCCTACCGCCCATTTGCGTCAACGGGGCGGGCGATCGCAGTTGTTACGACATTTTGGAAAGTCCCGTTGATTTTGGTGCCAAAACAAACTCCCATTGACGTCAATGGGGTGGAGACTTGGAAATCCCCGTGAGTCAAACCGCTATCCACGCCCATTGGTGTACTGCCAAAACCGCATCACCATGGTAATAGCGATGACTAATACGTAGATGTACTGCCAAGTAGGAAAGTCCCGTAAGGTCATGTACTGGGCATAATGCCAGGCGGGCCATTTACCGTCATTGACGTCAATAGGGGGCGGACTTGGCATATGATACACTTGATGTACTGCCAAGTGGGCAGTTTACCGTAAATACTCCACCCATTGACGTCAATGGAAAGTCCCTATTGGCGTTACTATGGGAACATACGTCATTATTGACGTCAATGGGCGGGGGTCGTTGGGCGGTCAGCCAGGCGGGCCATTTACCGTAAGTTATGTAACGCGGAACTCCATATATGGGCTATGAACTAATGACCCCGTAATTGATTACTATTAATAACTAGTCAATAATCAATGCCAACATGGCGGTCATATTGGACATGAGCCAATATAAATGTACATATTATGATATAGATACAACGTATGCAATGGCCAATAGCCAATATTGATTTATGCTATATAACCAATGAATAATATGGCTAATGGCCAATATTGAAGATCCCCGGGTACCGAGCTCGAATTCATCGATGATGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCGCCCTTCCCGCATGGCATCTCATTACCGCCCGATCCGGCGGTTTCCGCTTCCGTTCCGCATGCTAACGAGGAACGGGCAGGGGGCGGGGCCCGGGCCCCGACTTCCCGGTTCGGCGGTAATGTGATACGAGCCCCGCGCGCCCGTTGGCCGTCCCCGGGCCCCCGGTCCCGCCCGCCGGACGCCGGGACCAACGGGACGGCGGGCGGCCCTTGGGCCGCCCGCCTTGCCGCCCCCCCATTGGCCGGCGGGCGGGACCGCCCCAAGGGGGCGGGGCCGCCGGGTAAAAGAAGTGAGAACGCGAAGCGTTCGCACTTCGTCCCAATATATATATATTATTAGGGCGAAGTGCGAGCACTGGCGCCGTGCCCGACTCCGCGCCGGCCCCGGGGGCGGACCCGGGCGGCGGGGGGCGGGTCTCTCCGGCGCACATAAAGGCCCGGCGCGACCGACGCCCGCAGACGGCGCCGGCCACGAACGACGGGAGCGGCTGCGGAGCACGCGGACCGGGAGCGGGAGTCGCAGAGGGCCGTCGGAGCGGACGGCGTCGGCATCGCGACGCCCCGGCTCGGGATCGGGATCGCATCGGAAAGGGACACGCGGACGCGGGGGGGAAAGACCCGCCCACCCCACCCACGAAACACAGGGGACGCACCCCGGGGGCCTCCGACGACAGAAACCCACCGGTCCGCCTTTTTTGCACGGGTAAGCACCTTGGGTGGGCGGAGGAGGGGGGACGCGGGGGCGGAGGAGGGGGGACGCGGGGGCCGGAGGAGGGGGGACGCGGGGGCGGAGGAGGGGG G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259

This represents a deletion of 2542 bp.

The large deletion is present when I call
rowRanges(vcf)

So it is successfully being read by readVcf()

However, when I call:

locateVariants(
  vcf,
  txdb,
  AllVariants(
    promoter = PromoterVariants(0,0),
    intergenic = IntergenicVariants(0,0)
    )
  )

or
predictCoding(vcf, txdb, seqSource=fasta)

this variant position is not included among the results.

With some trial and error, I have determined that truncating the REF sequence to 800 characters allows the variant to be maintained, but 850 characters fails.

Success, 850 bp
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTG G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259

Fails, 900 bp
contig_xyz 133527 Sniffles2.DEL.50S0 GGATGAACTCTTGGATTCTTGGGCTCATGTCGGACAGGATGCCGCCCCAGCTTCTCAGGTTGATGCCCTGGGATCTAGAATCCAGCAGTGTGGGCAGCACCCGGAACAGCTTGAAGAAGTCCACGTTGGCGTACAGGGTATCCTCGATCCACTGCAGGGTTCCCTGGCTCAGACTGCACAGGGCATATCTGACGGTCTTGGCGCCTCTCCGCTGGCTGAAGATGATGAACCGTTCCAGCAGGGCCTCAGAACAGGCGATATCCTTCAGGGCGAGATCTGGCACGCCATGAGCAAACTGCTCGGGCCGCACTTGGCTGTTGATCAGCAGGTACACCACGCTGTCGCTCAGGCCGATGTTCTTGATGAGGAACAGTGTCAGGGTTTCCTCGTCCTTCAGGATGTCCCGGATTCTGATGCCCCTGCCGGCGATTCTCTCGGGGTGTGTTCTCAGGGTGTCCATGAACTGGCTCAGGATGTGCAGCTCGGTCCAGATTCTGCCCAGGTGCTGAGACTCAGGGGCGTTCATCAGCAGCTCTTGGAAGTCCCGGTACACTCTGGCCAGGATGCTGTTGTTGTAGTTGGACACGATGCCAGGGCTTTCGCCAGGTGTGGGGCTCTGAAAGCAGGGGTTGTTCACGTTGCAGAAGATGCCCTGCAGCCAAGGCAGCATTCCGGCAGAAGGCATGGCCTTGTTGGGGAAGTGACACTCGTGGTGGCTGTACAGAGGATTGGCGTTCCGCAGCCAGATCAGCACCAGAAACAGGCTCAGGGGCCACACGAGTTCCACCACGAATCTGATTTTCTGCCGCTTCCGCAGGGTCCAGTTCTTCCACAGCAGCAGCTGAATCTGCCGCACGAATCCCATGGTGGCCAGCTCGGTCGTCCCGGGGCCTCTACTGT G 59 PASS PRECISE;SVTYPE=DEL;SVLEN=-2542;END=136069;SUPPORT=259;COVERAGE=335,326,308,260,257;STRAND=+-;AF=0.869;STDEV_LEN=0;STDEV_POS=0 GT:GQ:DR:DV 1/1:60:39:259

Can you suggest a way to maintain these large INDELs among the results?
Alternatively, if it is impossible to maintain large INDELs, could a warning or error be returned instead of a silent discard?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions