Page Contents

The SRA_Fetch workflow downloads sequence data from NCBI’s Sequence Read Archive (SRA). It requires an SRA run accession then populates the associated read files to a Terra data table.

Inputs

The only input for the SRA_Fetch workflow is an SRA run accession, which begin with “SRR”, or an ENA run accession, which begin with “ERR”. Please see the NCBI Metadata and Submission Overview for assistance with identifying accessions: https://www.ncbi.nlm.nih.gov/sra/docs/submitmeta/. Briefly, NCBI-accessioned objects have the following naming scheme:

STUDY	SRP#
SAMPLE	SRS#
EXPERIMENT	SRX#
RUN	SRR#

Tasks/Actions

Read files associated with the SRA run accession provided as input are copied to a Terra-accessible google bucket. Hyperlinks to those files are shown in the “read1” and “read2” columns of the Terra data table.

Outputs

Output Name	Data Type	Description
`read1`	File	File containing the forward reads
`read2`	File	File containing the reverse reads (not available for Single-end or ONT data)
`fastq_dl_date`	String	Date of download
`fastq_dl_docker`	String	Fastq_dl docker container used
`fastq_dl_metadata`	File	File containing metadata of the provided accession such as submission_accession, library_selection, instrument_platform, among others
`fastq_dl_version`	String	Fastq_dl version used

References

This workflow relies on https://github.com/rpetit3/fastq-dl, a very handy bioinformatics tool by Robert A. Petit III

✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website