https://documents.lucid.app/documents/589ef827-edeb-4182-9336-ef9de7e57a79/pages/0_0?a=393&x=-226&y=-35&w=1439&h=729&store=1&accept=image%2F*&auth=LCA 2f30fd7691559f3b1fc99610421b61a0d98eaa7eb30854fedd8e2a582c281034-ts%3D1713449900
Page Contents
Freyja Wastewater Analysis Series
Find_Shared_Variants_PHB
is a workflow for concatenating the variant results produced by the Snippy_Variants_PHB
workflow across multiple samples and reshaping the data to illustrate variants that are shared among multiple samples.
The primary intended input of the workflow is the snippy_variants_results
output from Snippy_Variants_PHB
or the theiaeuk_snippy_variants_results
output of the TheiaEuk workflow. Variant results files from other tools may not be compatible at this time.
All variant data included in the sample set should be generated from aligning sequencing reads to the same reference genome. If variant data was generated using different reference genomes, shared variants cannot be identified and results will be less useful.
The cat_variants
task concatenates variant data from multiple samples into a single file concatenated_variants
. It is very similar to the cat_files
task, but also adds a column to the output file that indicates the sample associated with each row of data.
The concatenated_variants
file will be in the following format:
samplename | CHROM | POS | TYPE | REF | ALT | EVIDENCE | FTYPE | STRAND | NT_POS | AA_POS | EFFECT | LOCUS_TAG | GENE | PRODUCT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sample1 | PEKT02000007 | 5224 | snp | C | G | G:21 C:0 | ||||||||
sample2 | PEKT02000007 | 34112 | snp | C | G | G:32 C:0 | CDS | + | 153/1620 | 51/539 | missense_variant c.153C>G p.His51Gln | B9J08_002604 | hypothetical protein | |
sample3 | PEKT02000007 | 34487 | snp | T | A | A:41 T:0 | CDS | + | 528/1620 | 176/539 | missense_variant c.528T>A p.Asn176Lys | B9J08_002604 | hypothetical protein |
The shared_variants
task takes in the concatenated_variants
output from the cat_variants
task and reshapes the data so that variants are rows and samples are columns. For each variant, samples where the variant was detected are populated with a “1” and samples were either the variant was not detected or there was insufficient coverage to call variants are populated with a “0”. The resulting table is available as the shared_variants_table
output.
The shared_variants_table
file will be in the following format:
CHROM | POS | TYPE | REF | ALT | FTYPE | STRAND | NT_POS | AA_POS | EFFECT | LOCUS_TAG | GENE | PRODUCT | sample1 | sample2 | sample3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PEKT02000007 | 2693938 | snp | T | C | CDS | - | 1008/3000 | 336/999 | synonymous_variant c.1008A>G p.Lys336Lys | B9J08_003879 | NA | chitin synthase 1 | 1 | 1 | 0 |
PEKT02000007 | 2529234 | snp | G | C | CDS | + | 282/336 | 94/111 | missense_variant c.282G>C p.Lys94Asn | B9J08_003804 | NA | cytochrome c | 1 | 1 | 1 |
PEKT02000002 | 1043926 | snp | A | G | CDS | - | 542/1464 | 181/487 | missense_variant c.542T>C p.Ile181Thr | B9J08_000976 | NA | dihydrolipoyl dehydrogenase | 1 | 1 | 0 |
The outputs of this workflow are the concatenated_variants
file and the shared_variants_table
file.
✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website