Page Contents

Workflows available

Assembly_Fetch

Augur

BaseSpace_Fetch

Cauris_CladeTyper

Concatenate_Column_Content

Core_Gene_SNP

CZGenEpi_Prep

Find_Shared_Variants

Freyja Wastewater Analysis Series

GAMBIT_Query

Kraken2

kSNP3

Lyve_SET

MashTree_FASTA

Mercury_Prep_N_Batch

Pangolin Update

RASUSA

Rename_FASTQ

Snippy_Streamline

Snippy_Tree

Snippy_Variants

SRA_Fetch

TBProfiler_tNGS

Terra_2_GISAID

Terra_2_NCBI

TheiaCoV Workflow Series

TheiaEuk Workflow Series

TheiaMeta Workflow Series

TheiaProk Workflow Series

TheiaValidate

Transfer_Column_Content

Usher_PHB

VADR_Update

Zip_Column_Content

Guide to Phylogenetics

Overview

The kSNP3 workflow is for phylogenetic analysis of bacterial genomes using single nucleotide polymorphisms (SNPs). The kSNP3 workflow identifies SNPs amongst a set of genome assemblies, then calculates a number of phylogenetic trees based on those SNPs:

This workflow also features an optional module, summarize_data that creates a presence/absence matrix for the analyzed samples from a list of indicated columns (such as AMR genes, plasmid types etc.). If the phandango_coloring variable is set to true, this will be formatted for visualization in Phandango, else it can be viewed in Excel.

You can learn more about the kSNP3 workflow, including how to visualize the outputs with MicrobeTrace in the following video: 📺 Using KSNP3 in Terra and Visualizing Bacterial Genomic Networks in MicrobeTrace

Inputs

Required User Inputs

Optional User Inputs

Workflow Actions

The ksnp3 workflow is run on the set of assembly files to produce both pan-genome and core-genome phylogenies. This also results in alignment files which - are used by snp-dists to produce a pairwise SNP distance matrix for both the pan-genome and core-genomes.

If you fill out the data_summary_* and sample_names optional variables, you can use the optional summarize_data task. The task takes a comma-separated list of column names from the Terra data table, which should each contain a list of comma-separated items. For example, "amrfinderplus_virulence_genes,amrfinderplus_stress_genes" (with quotes, comma separated, no spaces) for these output columns from running TheiaProk. The task checks whether those comma-separated items are present in each row of the data table (sample), then creates a CSV file of these results. The CSV file indicates presence (TRUE) or absence (empty) for each item. By default, the task adds a Phandango coloring tag to group items from the same column, but you can turn this off by setting phandango_coloring to false.

Outputs

All outputs

References

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website

100 2 100 "us-docker.pkg.dev/general-theiagen/staphb/mykrobe:0.12.1”