Mercury_Prep_N_Batch: programmatic submission preparation

About

Mercury prepares and formats metadata and sequencing files located in Google Cloud Platform (GCP) buckets for submission to national & international databases, currently NCBI & GISAID. Mercury was initially developed to ingest read, assembly, and metadata files associated with SARS-CoV-2 amplicon reads from clinical samples and format that data for submission per the Public Health Alliance for Genomic Epidemiology (PH4GE)’s SARS-CoV-2 Contextual Data Specifications.

Currently, Mercury supports submission preparation for SARS-CoV-2, mpox, and influenza. These organisms have different metadata requirements, and are submitted to different repositories; the following table lists the repositories for each organism & what is supported in Mercury:

	BankIt (NCBI)	BioSample (NCBI)	GenBank (NCBI)	GISAID	SRA (NCBI)
`"flu"`		✓			✓
`"mpox"`	✓	✓		✓	✓
`"sars-cov-2"`		✓	✓	✓	✓

<aside> ℹ️ Important note: Mercury was designed to work with metadata tables that were partially created after running the TheiaCoV workflows. If you are using a different pipeline, please ensure that the metadata table is formatted correctly. See this file for the hard-coded list of all of the different metadata fields expected for each organism.

</aside>

Metadata formatters

To help users collect all required metadata, we have created the following Excel spreadsheets that can help you collect the necessary metadata and allow for easy upload of this metadata into your Terra data tables:

For flu
For mpox
For SARS-CoV-2

Usage on Terra

<aside> ℹ️ A note on the using_clearlabs_data & using_reads_dehosted optional input parameters

The using_clearlabs_data and using_reads_dehosted arguments change the default values for the read1_column_name, assembly_fasta_column_name, and assembly_mean_coverage_column_name metadata columns. The default values are shown in the table below in addition to what they are changed to depending on what arguments are used.

Variable	Default Value	with `using_clearlabs_data`	with `using_reads_dehosted`	with both `using_clearlabs_data` and `using_reads_dehosted`
`read1_column_name`	`"read1_dehosted"`	`"clearlabs_fastq_gz"`	`"reads_dehosted"`	`"reads_dehosted"`
`assembly_fasta_column_name`	`"assembly_fasta"`	`"clearlabs_fasta"`	`"assembly_fasta"`	`"clearlabs_fasta"`
`assembly_mean_coverage_column_name`	`"assembly_mean_coverage"`	`"clearlabs_assembly_coverage"`	`"assembly_mean_coverage"`	`"clearlabs_assembly_coverage"`
</aside>

Inputs

Outputs

Usage outside of Terra

This tool can also be used on the command-line. Please see the Mercury GitHub for more information on how to run Mercury with a Docker image or in your local command-line environment.