Page Contents

The SRA_Fetch workflow downloads sequence data from NCBI’s Sequence Read Archive (SRA). It requires an SRA run accession then populates the associated read files to a Terra data table.

Inputs

The only input for the SRA_Fetch workflow is an SRA run accession, which begin with “SRR”, or an ENA run accession, which begin with “ERR”. Please see the NCBI Metadata and Submission Overview for assistance with identifying accessions: https://www.ncbi.nlm.nih.gov/sra/docs/submitmeta/. Briefly, NCBI-accessioned objects have the following naming scheme:

STUDY SRP#
SAMPLE SRS#
EXPERIMENT SRX#
RUN SRR#

Tasks/Actions

Read files associated with the SRA run accession provided as input are copied to a Terra-accessible google bucket. Hyperlinks to those files are shown in the “read1” and “read2” columns of the Terra data table.

Outputs

Output Name Data Type Description
read1 File File containing the forward reads
read2 File File containing the reverse reads (not available for Single-end or ONT data)
fastq_dl_date String Date of download
fastq_dl_docker String Fastq_dl docker container used
fastq_dl_metadata File File containing metadata of the provided accession such as submission_accession, library_selection, instrument_platform, among others
fastq_dl_version String Fastq_dl version used

References

This workflow relies on https://github.com/rpetit3/fastq-dl, a very handy bioinformatics tool by Robert A. Petit III

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website