https://documents.lucid.app/documents/589ef827-edeb-4182-9336-ef9de7e57a79/pages/0_0?a=393&x=-226&y=-35&w=1439&h=729&store=1&accept=image%2F*&auth=LCA 2f30fd7691559f3b1fc99610421b61a0d98eaa7eb30854fedd8e2a582c281034-ts%3D1713449900

Page Contents

Workflows available

Assembly_Fetch

Augur

BaseSpace_Fetch

Cauris_CladeTyper

Concatenate_Column_Content

Core_Gene_SNP

CZGenEpi_Prep

Find_Shared_Variants

Freyja Wastewater Analysis Series

GAMBIT_Query

Kraken2

kSNP3

Lyve_SET

MashTree_FASTA

Mercury_Prep_N_Batch

Pangolin Update

RASUSA

Rename_FASTQ

Snippy_Streamline

Snippy_Tree

Snippy_Variants

SRA_Fetch

TBProfiler_tNGS

Terra_2_GISAID

Terra_2_NCBI

TheiaCoV Workflow Series

TheiaEuk Workflow Series

TheiaMeta Workflow Series

TheiaProk Workflow Series

TheiaValidate

Transfer_Column_Content

Usher_PHB

VADR_Update

Zip_Column_Content

Guide to Phylogenetics

Overview

Find_Shared_Variants_PHB is a workflow for concatenating the variant results produced by the Snippy_Variants_PHB workflow across multiple samples and reshaping the data to illustrate variants that are shared among multiple samples.

Inputs

The primary intended input of the workflow is the snippy_variants_results output from Snippy_Variants_PHB or the theiaeuk_snippy_variants_results output of the TheiaEuk workflow. Variant results files from other tools may not be compatible at this time.

All variant data included in the sample set should be generated from aligning sequencing reads to the same reference genome. If variant data was generated using different reference genomes, shared variants cannot be identified and results will be less useful.

Terra Inputs

Tasks

Concatenate Variants Task

The cat_variants task concatenates variant data from multiple samples into a single file concatenated_variants. It is very similar to the cat_files task, but also adds a column to the output file that indicates the sample associated with each row of data.

The concatenated_variants file will be in the following format:

samplename CHROM POS TYPE REF ALT EVIDENCE FTYPE STRAND NT_POS AA_POS EFFECT LOCUS_TAG GENE PRODUCT
sample1 PEKT02000007 5224 snp C G G:21 C:0
sample2 PEKT02000007 34112 snp C G G:32 C:0 CDS + 153/1620 51/539 missense_variant c.153C>G p.His51Gln B9J08_002604 hypothetical protein
sample3 PEKT02000007 34487 snp T A A:41 T:0 CDS + 528/1620 176/539 missense_variant c.528T>A p.Asn176Lys B9J08_002604 hypothetical protein

Shared Variants Task

The shared_variants task takes in the concatenated_variants output from the cat_variants task and reshapes the data so that variants are rows and samples are columns. For each variant, samples where the variant was detected are populated with a “1” and samples were either the variant was not detected or there was insufficient coverage to call variants are populated with a “0”. The resulting table is available as the shared_variants_table output.

The shared_variants_table file will be in the following format:

CHROM POS TYPE REF ALT FTYPE STRAND NT_POS AA_POS EFFECT LOCUS_TAG GENE PRODUCT sample1 sample2 sample3
PEKT02000007 2693938 snp T C CDS - 1008/3000 336/999 synonymous_variant c.1008A>G p.Lys336Lys B9J08_003879 NA chitin synthase 1 1 1 0
PEKT02000007 2529234 snp G C CDS + 282/336 94/111 missense_variant c.282G>C p.Lys94Asn B9J08_003804 NA cytochrome c 1 1 1
PEKT02000002 1043926 snp A G CDS - 542/1464 181/487 missense_variant c.542T>C p.Ile181Thr B9J08_000976 NA dihydrolipoyl dehydrogenase 1 1 0

Outputs

The outputs of this workflow are the concatenated_variants file and the shared_variants_table file.

Terra Outputs

References

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website