✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website

Page Contents

Date of Last Update:

August 2nd 2024

PHB version:

v2.1.0

Getting Started with Genomic Analysis in Public Health

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/b61fcdac-c45a-44d3-86b9-a241232128cc/download.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/b61fcdac-c45a-44d3-86b9-a241232128cc/download.png" width="40px" /> Theiagen’s approach to genomic analysis in public health typically uses the Terra platform to run workflows that undertake bioinformatic analysis, then uses other platforms for visualization of the resulting data. This is described in more depth in our paper “Accelerating bioinformatics implementation in public health”, and the application of this approach for genomic surveillance of SARS-CoV-2 in California is described in the paper “Pathogen genomics in public health laboratories: successes, challenges, and lessons learned from California’s SARS-CoV-2 Whole-Genome Sequencing Initiative, California COVIDNet”.

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/39a844d4-3053-4f0c-a8c3-dc265e8f9325/Picture3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/39a844d4-3053-4f0c-a8c3-dc265e8f9325/Picture3.png" width="40px" /> When undertaking genomic analysis using Terra and other data visualization platforms, it is essential to consider the necessary and appropriate workflows and resources for your analysis. To help you make these choices, take a look at the relationship between the most commonly used Theiagen workflows in the diagram, and the descriptions of the major stages in genomic data analysis below.

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/a7a39960-0058-472b-bafa-5109dd1bd393/Picture3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/a7a39960-0058-472b-bafa-5109dd1bd393/Picture3.png" width="40px" /> Detailed documentation for each PHB release, including helpful workflow input and output explanations, can be found on the Public Health Resources page!

Theiagen Public Health Resources

</aside>

Analysis Approaches for Genomic Data: This diagram shows the Theiagen workflows (green boxes) available on Terra for analysis of genomic data in public health and the workflows that may be used consecutively (arrows). The blue boxes describe the major functions that these workflows undertake. Descriptions of these functions and their workflows can be found in the Getting Started with Genomic Analysis in Public Health section below. The yellow boxes show functions that may be undertaken independently of workflows on Terra.

Analysis Approaches for Genomic Data: This diagram shows the Theiagen workflows (green boxes) available on Terra for analysis of genomic data in public health and the workflows that may be used consecutively (arrows). The blue boxes describe the major functions that these workflows undertake. Descriptions of these functions and their workflows can be found in the Getting Started with Genomic Analysis in Public Health section below. The yellow boxes show functions that may be undertaken independently of workflows on Terra.

Data Import to Terra

To start using Terra for data analysis, you will first need to import your data into your workspace. There are multiple ways to do this:

Genome assembly, QC, and characterization

Theia workflows

The Theia workflows are used for genome assembly, quality control, and characterization. The TheiaCoV Workflow Series, TheiaProk Workflow Series, and TheiaEuk Workflow Series workflows are intended for viral, bacterial, and fungal pathogens, respectively. TheiaMeta Workflow Series is intended for the analysis of a single taxon from metagenomic data.

<aside> ✅ SOPs

Quality evaluation

The TheiaX workflows will generate various quality metrics. These should be evaluated relative to quality thresholds that have been agreed upon within your laboratory or sequencing program and define the sufficient quality characteristics for a genome and sequence data to be used. For the TheiaCoV Workflow Series, TheiaProk Workflow Series, and TheiaEuk Workflow Series workflows, this quality evaluation may be undertaken using the optional QC_check task. Full instructions for the use of this task may be found on the relevant workflow page. Some quality metrics are not evaluated by the QC_check task and should be evaluated manually.

Genomes that fail to meet agreed quality thresholds should not be used. Results for characterization of these genomes may be inaccurate or unreliable. The inclusion of poor-quality genomes in downstream comparative analyses will bias their results. Samples that fail to meet QC thresholds will need to be re-sequenced and sample processing may need to be repeated (e.g. culture-based isolation of clonal bacteria, DNA/RNA extraction, and processing for sequencing).

Update workflows for SARS-CoV-2 genomes

Workflows are available for updating the Pangolin and VADR assignments made to SARS-CoV-2 genomes. The Pangolin Update workflow accounts for the delay in assigning names to newly emerging lineages that you may have already sequenced. The VADR_Update workflow similarly accounts for features that have been newly identified in SARS-CoV-2 genomes when assessing genome quality with VADR.

Phylogenetics

Phylogenetic construction

Phylogenetic trees are constructed to assess the evolutionary relationships between sequences in the tree. These evolutionary relationships are often used as a proxy for epidemiological relationships, and sometimes for inferring transmission between isolation sources.

There are various methods for constructing phylogenetic trees, depending on the sequencing data being used, the organism being analyzed and how it evolved, what you would like to infer from the tree, and the computational resources available for the tree construction. Theiagen has a number of workflows for constructing phylogenetic trees. For full details of these workflows, please see Guide to Phylogenetics which includes advice on the appropriate tree-building workflows and phylogenetic visualization approaches.

<aside> ✅ SOPs

Phylogenetic placement

Phylogenetic placement is used to place your own sequences onto an existing phylogenetic tree. This may be used to find the closest relatives to your sequence(s). More details, including phylogenetic visualization approaches can be found in Guide to Phylogenetics

Public Data Sharing

<aside> ✅ SOPs

SARS-CoV-2 Metagenomic Analysis

<aside> ✅ SOPs