Snippy_Variants_wide (2).png

Page Contents

The Snippy_Variants workflow aligns single-end or paired-end reads against a reference genome, then identifies single-nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs), and insertions/deletions (INDELs) across the alignment. If a GenBank file is used as the reference, mutations associated with user-specified query strings (e.g. genes of interest) can additionally be reported to the Terra data table.

Example use cases

Finding mutations (SNPs, MNPs, and INDELs) in your own sample’s reads relative to a reference, e.g. mutations in genes of phenotypic interest.
Quality control: When undertaking quality control of sequenced isolates, it is difficult to identify contamination between multiple closely related genomes using the conventional approaches in TheiaProk (e.g. isolates from an outbreak or transmission cluster). Such contamination may be identified as allele heterogeneity at a significant number of genome positions. Snippy_Variants may be used to identify these heterogeneous positions by aligning reads to the assembly of the same reads, or to a closely related reference genome and lowering the thresholds to call SNPs.
Assessing support for a mutation: Snippy_Variants produces a BAM file of the reads aligned to the reference genome. This BAM file can be visualized in IGV (see Theiagen Office Hours recordings) to assess the position of a mutation in supporting reads, or if the assembly of the reads was used as a reference, the position in the contig.
- Mutations that are only found at the ends of supporting reads may be an error of sequencing.
- Mutations found at the end of contigs may be assembly errors.

Inputs

Snippy_Variants Inputs

Single or paired-end reads resulting from Illumina or IonTorrent sequencing can be used. For single-end data, simply omit an attribute for read2
The reference file should be in fasta (e.g. .fa, .fasta) or full GenBank (.gbk) format. The mutations identified by Snippy_Variants are highly dependent on the choice of reference genome. Mutations cannot be identified in genomic regions that are present in your query sequence and not the reference.

<aside> 💡 The query string can be a gene or any other annotation that matches the GenBank file/output VCF EXACTLY

</aside>

Workflow Tasks

Snippy_Variants uses the snippy tool to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the grep bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the snippy_results column.

Outputs

Full outputs

Output bam/bai files may be visualized using IGV in Terra to manually assess read placement and SNP support. See Theiagen Office Hours recordings for instructions on visualizing this alignment against the reference.

References

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website