Page Contents

Workflows available

Assembly_Fetch

Augur

Concatenate_Column_Content

Core_Gene_SNP

CZGenEpi_Prep

Freyja Wastewater Analysis

Kraken2

kSNP3

Lyve_SET

MashTree_FASTA

Mercury_Prep_N_Batch

Pangolin Update

Rasusa

Snippy_Streamline

Snippy_Tree

Snippy_Variants

SRA_Fetch

Terra_2_GISAID

Terra_2_NCBI

TheiaCoV Genomic Characterization

TheiaEuk

TheiaMeta

TheiaProk Workflow Series

TheiaValidate

The Guide to Phylogenetics

Usher_PHB

VADR_Update

Zip_Column_Content

PHB TheiaEuk 2023-06-22 (1).png

Overview

The TheiaEuk_PE workflow is for the assembly, quality assessment, and characterization of fungal genomes. It is designed to accept Illumina paired-end sequencing data as the primary input.

All input reads are processed through “core tasks” in each workflow. The core tasks include raw-read quality assessment, read cleaning (quality trimming and adapter removal), de novo assembly, assembly quality assessment, and species taxon identification. For some taxa identified, “taxa-specific sub-workflows” will be automatically activated, undertaking additional taxa-specific characterization steps, including clade-typing and/or antifungal resistance detection.


Inputs

TheiaEuk_PE has a number of required and optional inputs. The page linked below shows all input variables as in the Terra workflow input form and additional descriptions of these variables, default values used, and links to the relevant sections of this documentation page.

TheiaEuk_PE Inputs

<aside> ℹ️ Input read data

The TheiaEuk_PE workflow takes in Illumina paired-end read data. Read file names should end with .fastq or .fq, with the optional addition of .gz. When possible, Theiagen recommends zipping files with gzip prior to Terra upload to minimize data upload time.

By default, the workflow anticipates 2 x 150bp reads (i.e. the input reads were generated using a 300-cycle sequencing kit). Modifications to the optional parameter for trim_minlen may be required to accommodate shorter read data, such as the 2 x 75bp reads generated using a 150-cycle sequencing kit.

</aside>


Core Workflow Components

versioning: Version capture for TheiaEuk

screen: Total Raw Read Quantification and Genome Size Estimation

rasusa: Read subsampling

read_QC_trim: Read Quality Trimming, Adapter Removal, and Quantification

shovill: De novo Assembly

QUAST: Assembly Quality Assessment

CG-Pipeline: Assessment of Read Quality, and Estimation of Genome Coverage

GAMBIT: Taxon Assignment

TS_MLST: MLST Profiling

QC_check: Check QC Metrics Against User-Defined Thresholds (optional)


Taxa-specific sub-workflows

The TheiaEuk workflow automatically activates taxa-specific sub-workflows after identification of relevant taxa using GAMBIT. Many of these taxa-specific workflows do not require any additional workflow inputs from the user.

Candida auris

Candida albicans

Aspergillus fumigatus

Cryptococcus neoformans


Outputs

Outputs

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website