Page Contents
The SRA_Fetch
workflow downloads sequence data from NCBI’s Sequence Read Archive (SRA). It requires an SRA run accession then populates the associated read files to a Terra data table.
The only input for the SRA_Fetch workflow is an SRA run accession, which begin with “SRR”, or an ENA run accession, which begin with “ERR”. Please see the NCBI Metadata and Submission Overview for assistance with identifying accessions: https://www.ncbi.nlm.nih.gov/sra/docs/submitmeta/. Briefly, NCBI-accessioned objects have the following naming scheme:
STUDY | SRP# |
---|---|
SAMPLE | SRS# |
EXPERIMENT | SRX# |
RUN | SRR# |
Read files associated with the SRA run accession provided as input are copied to a Terra-accessible google bucket. Hyperlinks to those files are shown in the “read1” and “read2” columns of the Terra data table.
Output Name | Data Type | Description |
---|---|---|
read1 |
File | File containing the forward reads |
read2 |
File | File containing the reverse reads (not available for Single-end or ONT data) |
fastq_dl_date |
String | Date of download |
fastq_dl_docker |
String | Fastq_dl docker container used |
fastq_dl_metadata |
File | File containing metadata of the provided accession such as submission_accession, library_selection, instrument_platform, among others |
fastq_dl_version |
String | Fastq_dl version used |
This workflow relies on https://github.com/rpetit3/fastq-dl, a very handy bioinformatics tool by Robert A. Petit III
✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website