Page Contents
The SRA_Fetch
workflow downloads sequence data from NCBI’s Sequence Read Archive (SRA). It requires an SRA run accession to populate the associated read files to a Terra data table.
The only input for the SRA_Fetch workflow is an SRA run accession, which begin with “SRR”, or an ENA run accession, which begin with “ERR”. Please see the NCBI Metadata and Submission Overview for assistance with identifying accessions: https://www.ncbi.nlm.nih.gov/sra/docs/submitmeta/. Briefly, NCBI-accessioned objects have the following naming scheme:
STUDY | SRP# |
---|---|
SAMPLE | SRS# |
EXPERIMENT | SRX# |
RUN | SRR# |
Only RUN level accession numbers result in workflow success.
Read files associated with the SRA run accession provided as input are copied to your workspace’s associated Google bucket. Hyperlinks to those files are shown in the “read1” and “read2” columns of the Terra data table.
This workflow produces output columns for the read data. For paired-end data, these are read1
and read2
columns (for single-end data, only the read1
column).
This workflow relies on https://github.com/rpetit3/fastq-dl, a very handy bioinformatics tool by Robert A. Petit III
✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website