Page Contents

Workflows available

Assembly_Fetch

Augur

Concatenate_Column_Content

Core_Gene_SNP

Freyja Wastewater Analysis

Kraken2

kSNP3

Lyve_SET

MashTree_FASTA

Mercury_Prep_N_Batch

Pangolin Update

Rasusa

Snippy_Streamline

Snippy_Tree

Snippy_Variants

SRA_Fetch

Terra_2_GISAID

Terra_2_NCBI

TheiaCoV Genomic Characterization

TheiaEuk

TheiaMeta

TheiaProk Workflow Series

TheiaValidate

Usher_PHB

VADR_Update

Zip_Column_Content

Overview

MashTree_FASTA creates a phylogenetic tree using Mash distances.

Mash distances are representations of how many kmers two sequences have in common. These distances are generated by transforming all kmers from a sequence into an integer value with hashing and Bloom filters. The hashed kmers are sorted and a “sketch” is created by only using the kmers that appear at the top of the sorted list. These sketches can be compared by counting the number of hashed kmers they have in common. Mashtree uses a neighbor-joining algorithm to cluster these “distances” into phylogenetic trees.

This workflow also features an optional module, summarize_data, that creates a presence/absence matrix for the analyzed samples from a list of indicated columns (such as AMR genes, etc.) that can be used in Phandango.

Inputs

Required User Inputs

Optional User Inputs

Tasks/Actions

MashTree_Fasta is run on a set of assembly fastas and creates a phylogenetic tree and matrix. These outputs are passed to a task that will rearrange the matrix to match the order of the terminal ends in the phylogenetic tree.

The optional summarize_data task performs the following only if all of the data_summary_* and sample_names optional variables are filled out:

  1. Digests a comma-separated list of column names, such as "amrfinderplus_virulence_genes,amrfinderplus_stress_genes", etc. that can be found within the origin Terra data table.
  2. It will then parse through those column contents and extract each value; for example, if the amrfinder_amr_genes column for a sample contains these values: "aph(3')-IIIa,tet(O),blaOXA-193", the summarize_data task will check each sample in the set to see if they also have those AMR genes detected.
  3. Outputs a .csv file that indicates presence (TRUE) or absence (empty) for each item in those columns; that is, it will check each sample in the set against the detected items in each column to see if that value was also detected.

By default, this task appends a Phandango coloring tag to color all items from the same column the same; this can be turned off by setting the optional phandango_coloring variable to false.

Outputs

All outputs

References

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website