TheiaProk main (1).png

Page Contents

Workflows available

Assembly_Fetch

Augur

Concatenate_Column_Content

Core_Gene_SNP

CZGenEpi_Prep

Freyja Wastewater Analysis

Kraken2

kSNP3

Lyve_SET

MashTree_FASTA

Mercury_Prep_N_Batch

Pangolin Update

Rasusa

Snippy_Streamline

Snippy_Tree

Snippy_Variants

SRA_Fetch

Terra_2_GISAID

Terra_2_NCBI

TheiaCoV Genomic Characterization

TheiaEuk

TheiaMeta

TheiaProk Workflow Series

TheiaValidate

The Guide to Phylogenetics

Usher_PHB

VADR_Update

Zip_Column_Content

Overview

The TheiaProk workflows are for the assembly, quality assessment, and characterization of bacterial genomes. There are currently four TheiaProk workflows designed to accommodate different kinds of input data:

  1. Illumina paired-end sequencing (TheiaProk_Illumina_PE)
  2. Illumina single-end sequencing (TheiaProk_Illumina_SE)
  3. ONT sequencing (TheiaProk_ONT)
  4. Genome assemblies (TheiaProk_FASTA)

All input reads are processed through “core tasks” in the TheiaProk Illumina and ONT workflows. These undertake read trimming and assembly appropriate to the input data type. TheiaProk workflows subsequently launch default genome characterization modules for quality assessment, species identification, antimicrobial resistance gene detection, sequence typing, and more. For some taxa identified, “taxa-specific sub-workflows” will be automatically activated, undertaking additional taxa-specific characterization steps. When setting up each workflow, users may choose to use “optional tasks” as additions or alternatives to tasks run in the workflow by default.

Inputs


TheiaProk_Illumina_PE

TheiaProk_Illumina_SE

TheiaProk_ONT

TheiaProk_FASTA

Core Tasks for TheiaProk_Illumina_PE and TheiaProk_Illumina_SE


versioning: Version Capture for TheiaProk

screen: Total Raw Read Quantification and Genome Size Estimation

read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

CG-Pipeline: Assessment of Read Quality and Estimation of Genome Coverage

shovill: De novo Assembly

Core Tasks for TheiaProk_ONT


versioning: Version Capture for TheiaProk

screen: Total Raw Read Quantification and Genome Size Estimation

read_QC_trim_ont: Read Quality Trimming, Quantification, and Identification

dragonflye: De novo Assembly

Default Sample Characterization

The following tasks are performed for all TheiaProk workflows.


QUAST: Assembly Quality Assessment

BUSCO: Assembly Quality Assessment

MUMmer_ANI: Average Nucleotide Identity (optional)

GAMBIT: Taxon Assignment

KmerFinder: Taxon Assignment (optional)

AMRFinderPlus: AMR Genotyping (default)

ResFinder: AMR Genotyping (alternative)

TS_MLST: MLST Profiling

Prokka: Assembly Annotation (default)

Bakta: Assembly Annotation (alternative)

PlasmidFinder: Plasmid Identification

QC_check: Check QC Metrics Against User-Defined Thresholds (optional)

Taxon Tables: Copy outputs to new data tables based on taxonomic assignment (optional)

Taxa-specific sub-workflows


The TheiaProk workflows automatically activate taxa-specific sub-workflows after the identification of relevant taxa using GAMBIT. Alternatively, the user can provide the expected taxa in the expected_taxon workflow input to override the taxonomic assignment made by GAMBIT. Modules are launched for all TheiaProk workflows unless otherwise indicated.

Acinetobacter baumannii

Escherichia or Shigella spp

Haemophilus influenzae

Klebsiella spp

Legionella pneumophila

Listeria monocytogenes

Mycobacterium tuberculosis

Neisseria spp

Pseudomonas aeruginosa

Salmonella spp

Staphyloccocus aureus

Streptococcus pneumoniae

Streptococcus pyogenes

Vibrio spp

Outputs


TheiaProk_ONT_PHB Outputs