TheiaProk_Illumina_PEv1.3.0_Vibrio.png

Page Contents

Overview

The TheiaProk workflows are for the assembly, quality assessment, and characterization of bacterial genomes.

There are currently two TheiaProk workflows: one for Illumina paired-end sequencing (TheiaProk_Illumina_PE), and another for Illumina single-end sequencing (TheiaProk_Illumina_SE). Besides the data input types, there are minimal differences between these two workflows.

All input reads are processed through “core tasks” in each workflow. These undertake read trimming and assembly, quality assessment, species identification, and some genome characterization. For some taxa identified, “taxa-specific sub-workflows” will be automatically activated, undertaking additional taxa-specific characterization steps. When setting up each workflow, users may choose to use “optional tasks” as additions or alternatives to tasks run in the workflow by default.

Inputs


TheiaProk_Illumina_PE

TheiaProk_Illumina_SE

Core Tasks


versioning: Version Capture for TheiaProk

screen: Total Raw Read Quantification and Genome Size Estimation

read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

CG-Pipeline: Assessment of Read Quality, and Estimation of Genome Coverage

shovill: De novo Assembly

QUAST: Assembly Quality Assessment

BUSCO: Assembly Quality Assessment

MUMmer_ANI: Average Nucleotide Identity (optional)

GAMBIT: Taxon Assignment

AMRFinderPlus: AMR Genotyping (default)

ResFinder: AMR Genotyping (alternative)

TS_MLST: MLST Profiling

Prokka: Assembly Anotation (default)

Bakta: Assembly Annotation (alternative)

PlasmidFinder: Plasmid Identification

QC_check: Check QC Metrics Against User-Defined Thresholds (optional)

Taxa-specific sub-workflows


The TheiaProk workflow automatically activates taxa-specific sub-workflows after identification of relevant taxa using GAMBIT.

Escherichia spp

Shigella spp

Salmonella spp

Listeria monocytogenes

Legionella pneumophila

Klebsiella spp

Mycobacterium tuberculosis

Acinetobacter baumannii

Pseudomonas aeruginosa

Streptococcus pneumoniae

Neisseria spp

Staphyloccocus aureus

Vibrio spp

Outputs


TheiaProk_Illumina_PE Outputs

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website

https://github.com/sanger-pathogens/seroba