Page Contents
Workflows available
Assembly_Fetch
Augur
BaseSpace_Fetch
Cauris_CladeTyper
Concatenate_Column_Content
Core_Gene_SNP
Create_Terra_Table
CZGenEpi_Prep
Find_Shared_Variants
Freyja Workflow Series
GAMBIT_Query
Kraken2
kSNP3
Lyve_SET
MashTree_FASTA
Mercury_Prep_N_Batch
NCBI-AMRFinderPlus
Pangolin Update
RASUSA
Rename_FASTQ
Samples_to_Ref_Tree
Snippy_Streamline
Snippy_Streamline_FASTA
Snippy_Tree
Snippy_Variants
SRA_Fetch
TBProfiler_tNGS
Terra_2_GISAID
Terra_2_NCBI
TheiaCoV Workflow Series
TheiaEuk
TheiaMeta
TheiaProk Workflow Series
TheiaValidate
Transfer_Column_Content
Usher_PHB
VADR_Update
Zip_Column_Content
Overview
The Lyve_SET WDL workflow runs the https://github.com/lskatz/lyve-SET pipeline developed by Lee Katz et al. for phylogenetic analysis of bacterial genomes using high quality single nucleotide polymorphisms (hqSNPs). The Lyve_SET workflow identifies SNPs amongst a set of samples by mapping sequencing reads to a reference genome, identifying high quality SNPs, and inferring phylogeny using RAxML.

Inputs
Required User Inputs
Optional User Inputs
Tasks/Actions
The Lyve_SET WDL workflow is run using read data from a set of samples. The workflow will produce a pairwise SNP matrix for the sample set and a maximum likelihood phylogenetic tree. Details regarding the default implementation of Lyve_SET and optional modifications are listed below.
- Read processing
- By default, the Lyve_SET WDL workflow will perform read cleaning using the CG-Pipeline “CGP”. However, read cleaning can be turned off or performed using “BayesHammer” using the
read_cleaner input variable.
- Reference procurement
- By default, the Lyve_SET WDL workflow will not mask phages or cliffs in the reference genome. Cliffs refer to regions of the reference genome where read coverage rises or falls abruptly. Masking phages and cliffs is intended to remove low quality SNPs. Users can invoke phage and cliff masking by setting the
mask_cliffs and mask_phages variables to “true”.
- SNP discovery
- The Lyve_SET WDL workflow uses the default read mapper and variant caller from the Lyve-SET pipeline (
smalt and varscan). Additional options for each are available using the mapper and snpcaller input variables.
- The workflow also uses the default parameters for variant calling from the Lyve-SET pipeline: the minimum percent consensus to call a base is 0.75 and minimum read depth is 10X. These parameters can be manually modified using the
min_alt_frac and min_coverage input variables.
- Phylogenetic analysis
- The Lyve_SET workflow will attempt to produce a multiple sequence alignment, SNP distance matrix, and phylogenetic tree. These actions can be skipped by indicating
nomsa = true, nomatrix = true, or notrees = true, respectively.
Outputs
For full descriptions of Lyve-SET pipeline outputs, we recommend consulting the Lyve-SET documentation: https://github.com/lskatz/lyve-SET/blob/master/docs/OUTPUT.md
The following output files are populated to the Terra data table. However, please note that certain files may not appear in the data table following a run for two main reasons:
- The user instructed the workflow to skip an analysis step
- For example, if
notrees = true, no tree file will appear
- The workflow skipped an analysis step due to an issue with the input data
Outputs
In addition to these outputs, all of the files produced by the Lyve-SET pipeline are available in the task-level outputs, including intermediate files and individual bam and vcf files for each sample. These files can be accessed viewing the execution directory for the run.
References