Overview

Find_Shared_Variants_PHB is a workflow for concatenating the variant results produced by the Snippy_Variants_PHB workflow across multiple samples and reshaping the data to illustrate variants that are shared among multiple samples.

Inputs

The primary intended input of the workflow is the snippy_variants_results output from Snippy_Variants_PHB or the theiaeuk_snippy_variants_results output of the TheiaEuk workflow. Variant results files from other tools may not be compatible at this time.

All variant data included in the sample set should be generated from aligning sequencing reads to the same reference genome. If variant data was generated using different reference genomes, shared variants cannot be identified and results will be less useful.

Terra Inputs

Tasks

Concatenate Variants Task

The cat_variants task concatenates variant data from multiple samples into a single file concatenated_variants. It is very similar to the cat_files task, but also adds a column to the output file that indicates the sample associated with each row of data.

The concatenated_variants file will be in the following format:

samplename	CHROM	POS	TYPE	REF	ALT	EVIDENCE	FTYPE	STRAND	NT_POS	AA_POS	EFFECT	LOCUS_TAG	GENE
sample1	PEKT02000007	5224	snp	C	G	G:21 C:0
sample2	PEKT02000007	34112	snp	C	G	G:32 C:0	CDS	+	153/1620	51/539	missense_variant c.153C>G p.His51Gln	B9J08_002604	hypothetical protein
sample3	PEKT02000007	34487	snp	T	A	A:41 T:0	CDS	+	528/1620	176/539	missense_variant c.528T>A p.Asn176Lys	B9J08_002604	hypothetical protein

Technical details

Shared Variants Task

The shared_variants task takes in the concatenated_variants output from the cat_variants task and reshapes the data so that variants are rows and samples are columns. For each variant, samples where the variant was detected are populated with a “1” and samples were either the variant was not detected or there was insufficient coverage to call variants are populated with a “0”. The resulting table is available as the shared_variants_table output.

The shared_variants_table file will be in the following format:

CHROM	POS	TYPE	REF	ALT	FTYPE	STRAND	NT_POS	AA_POS	EFFECT	LOCUS_TAG	GENE	PRODUCT	sample1	sample2	sample3
PEKT02000007	2693938	snp	T	C	CDS	-	1008/3000	336/999	synonymous_variant c.1008A>G p.Lys336Lys	B9J08_003879	NA	chitin synthase 1	1	1	0
PEKT02000007	2529234	snp	G	C	CDS	+	282/336	94/111	missense_variant c.282G>C p.Lys94Asn	B9J08_003804	NA	cytochrome c	1	1	1
PEKT02000002	1043926	snp	A	G	CDS	-	542/1464	181/487	missense_variant c.542T>C p.Ile181Thr	B9J08_000976	NA	dihydrolipoyl dehydrogenase	1	1	0

Technical details

Outputs

The outputs of this workflow are the concatenated_variants file and the shared_variants_table file.

Terra Outputs

References

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website