Overview

Freyja is a tool for analysing viral mixed sample genomic sequencing data. Developed by Joshua Levy from the Andersen Lab, it performs two main steps:

Single nucleotide variant (SNV) frequency estimation;
Depth-weighted demixing using constrained least absolute deviation regression.

Additional post-processing steps can produce visualizations of aggregated samples.

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/81a016f8-59ba-4f86-8215-dc4ad3e1e6c2/Picture3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/81a016f8-59ba-4f86-8215-dc4ad3e1e6c2/Picture3.png" width="40px" /> The typical use case of Freyja is to analyze mixed SARS-CoV-2 samples from a sequencing dataset, most often wastewater.

</aside>

<aside> ⚠️ The defaults included in the Freyja workflows reflect this use case but can be adjusted for other pathogens. Please see the Running Freyja on other pathogens section for more information.

</aside>

Figure 1: Workflow diagram for Freyja_FASTQ_PHB workflow. Depending on the type of data (Illumina or Oxford Nanopore), the Read QC and Filtering steps, as well as the Read Alignment steps use different software. The user can specify if the barcodes and lineages file should be updated with before running Freyja or if bootstrapping is to be performed with .

Figure 1: Workflow diagram for Freyja_FASTQ_PHB workflow. Depending on the type of data (Illumina or Oxford Nanopore), the Read QC and Filtering steps, as well as the Read Alignment steps use different software. The user can specify if the barcodes and lineages file should be updated with freyja update before running Freyja or if bootstrapping is to be performed with freyja boot.

Four workflows have been created that perform different parts of Freyja: ****

The main workflow is Freyja_FASTQ_PHB (Figure 1). Depending on the type of input data (Illumina paired-end, Illumina single-end or ONT), it ****runs various QC modules before aligning the sample with either BWA (Illumina) or minimap2 (ONT) to the provided reference file, followed by iVar for primer trimming. After the preprocessing is completed, Freyja is run to generate relative lineage abundances (demix) from the sample. Optional bootstrapping may be performed.

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/afbc439b-c05c-4dc1-be9a-fc3a30847fef/Picture3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/afbc439b-c05c-4dc1-be9a-fc3a30847fef/Picture3.png" width="40px" /> The Freyja_FASTQ_PHB workflow is compatible with the following input data types:

Ilumina Single-End
Illumina Paired-End
Oxford Nanopore

</aside>

Freyja_Update_PHB will copy the SARS-CoV-2 reference files (curated_lineages.json and usher_barcodes.feather) from the source repository to a user-specific Google Cloud Storage (GCP) location (often a Terra.bio workspace-associated bucket). These files can then be used as input for the Freyja_FASTQ_PHB workflow.

Two options are available to visualize the Freyja results: Freyja_Plot_PHB and Freyja_Dashboard_PHB. Freyja_Plot_PHB aggregates multiple samples using output from Freyja_FASTQ_PHB to generate a plot that shows fractional abundance estimates for all samples. including the option to plot sample collection date information. Alternatively, Freyja_Dashboard_PHB aggregates multiple samples using output from Freyja_FASTQ to generate an interactive visualization. This workflow requires an additional input field called viral load, which is the number of viral copies per liter.

Freyja_Update_PHB

Freyja_FASTQ_PHB

Freyja_Plot_PHB

Freyja_Dashboard_PHB

Running Freyja on other pathogens

The main requirement to run Freyja on other pathogens is the existence of a barcode file for your pathogen of interest. Currently, barcodes exist for the following organisms

MEASLES
MPXV
RSVa
RSVb

The appropriate barcode file and reference sequence need to be downloaded and uploaded to your Terra.bio workspace.

<aside> ⚠️ Data for various pathogens can be found in the following repository: Freyja Barcodes Folders are organized by pathogen, with each subfolder named after the date the barcode was generated, using the format YYYY-MM-DD. Barcode files are named barcode.csv, and reference genome files are named reference.fasta.

</aside>

When running Freyja_FASTQ_PHB, the appropriate reference and barcodes file need to be passed as inputs. The first is a required input and will show up at the top of the workflows inputs page on Terra.bio (Figure 2).

Figure 2: Required input for Freyja_FASTQ_PHB to provide the reference genome to be used by Freyja.

Figure 2: Required input for Freyja_FASTQ_PHB to provide the reference genome to be used by Freyja.

The barcodes file can be passed directly to Freyja by the freyja_usher_barcodes optional input (Figure 3).

Figure 3: Optional input for Freyja_FASTQ_PHB to provide the barcodes file to be used by Freyja.

Figure 3: Optional input for Freyja_FASTQ_PHB to provide the barcodes file to be used by Freyja.

<aside> ⚠️ It’s important to ensure that the update_db option is set to false so that default input files for SARS-CoV-2 are not inappropriately used!

</aside>

References

If you use any of the Freyja workflows, please cite:

Karthikeyan, S., Levy, J.I., De Hoff, P. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609, 101–108 (2022). https://doi.org/10.1038/s41586-022-05049-6

Freyja source code can be found at ‣

Freyja barcodes (non-SARS-CoV-2): https://github.com/gp201/Freyja-barcodes