✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/39a844d4-3053-4f0c-a8c3-dc265e8f9325/Picture3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/be290196-9090-4f3c-a9ab-fe730ad213e0/39a844d4-3053-4f0c-a8c3-dc265e8f9325/Picture3.png" width="40px" /> Running workflows on the command line requires the direct use of the WDL (Workflow Development Language). As the name suggests, this is the workflow management language that is used to write and execute workflows. Frank has put together a great video describing📺 WDL Task and Workflow Files and you can find full instructions below on running these WDL workflows.

</aside>

Step 1: Obtain the Workflow and Data

You will need to have access to the WDL workflow file (.wdl) and any associated input files (such as reference genomes, input data files, etc.). To do this, complete the following steps:

1. Install Git (if not already installed)

If you don't already have Git installed on your system, you will need to install it. Here's how you can install Git on some common operating systems:

2. Clone the Repository

  1. Open your terminal.

  2. Create a directory where you want to store the cloned repository and navigate to it.

    mkdir /path/to/your/desired/new/directory
    cd /path/to/your/desired/new/directory
    
  3. Clone the ‣ repository from GitHub using the following command:

    git clone <https://github.com/theiagen/public_health_bioinformatics.git>
    
  4. After running the command, Git will download all the repository files and set up a local copy in the directory you specified.

3. Navigate to the Cloned Repository

  1. Change your working directory to the newly cloned repository:

    cd public_health_bioinformatics
    
  2. You're now inside the cloned repository's directory. Here, you should find all the files and directories from the GitHub repository.

4. Verify the Cloned Repository

You can verify that the repository has been cloned successfully by listing the contents of the current directory using the ls (on Linux/macOS) or dir (on Windows) command:

ls

This should display the files and directories within the ‣ repository.

Congratulations! You've successfully cloned the ‣ repository from GitHub to your local command line environment. You're now ready to proceed with running the bioinformatics analysis workflows using WDL as described in subsequent steps.

Step 2: Install docker and miniWDL

Docker and miniwdl will be required for command line execution. We will check if these are installed on your system and if not, install them now.

Docker

miniwdl

  1. Open your terminal.

  2. Navigate to the directory where your workflow and input files are located using the cd command:

    cd /path/to/your/workflow/directory
    
  3. Check if Docker is installed:

    docker --version
    

    If Docker is not installed, follow the official installation guide for your operating system: **https://docs.docker.com/get-docker/**

  4. Check if miniwdl is installed:

    miniwdl --version
    

    If miniwdl is not installed, you can install it using pip:

    pip install miniwdl
    

Step 3: Set up the input.json file for your WDL workflow

In a WDL (Workflow Description Language) workflow, an input JSON file is used to provide attributes (values/files etc) for input variables into the workflow. The names of the input variables must match the names of inputs specified in the workflow file. The workflow files can be found within the git repository that you cloned. Each input variable can have a specific type of attribute, such as String, File, Int, Boolean, Array, etc. Here's a detailed outline of how to specify different types of input variables in an input JSON file:

Step 4: Execute the Workflow

Run the workflow using miniwdl with the following command, replacing your_workflow.wdl with the actual filename of your WDL workflow and input.json with the filename of your input JSON file.

miniwdl run your_workflow.wdl --input input.json

Step 5: Monitor Workflow Progress

You can monitor the progress of the workflow by checking the console output for updates and log messages. This can help you identify any potential issues or errors during execution.

Tips for monitoring your workflow

What to do if you need to cancel a run

Step 6: Review Output

Once the workflow completes successfully, you will find the output files and results in the designated output directory as defined in your WDL workflow.

Conclusion

Reviewing the outputs of your bioinformatics workflow is a critical step to ensure the quality of your analysis. Logs, stderr, stdout, and generated output files provide valuable insights into the execution process and results. By carefully reviewing these outputs and addressing any issues, you can enhance the reliability and accuracy of your bioinformatics analysis.

Step 7: Troubleshooting and Debugging

  1. If the workflow encounters errors or fails to execute properly, review the error messages in the terminal.
  2. Check for any missing input files, incorrect paths, or issues related to software dependencies.
  3. Double-check your input JSON file to ensure that all required inputs are correctly specified.

Congratulations! You have successfully executed a bioinformatics analysis workflow using WDL on the command line. This tutorial covered the basic steps to run a WDL workflow using the miniwdl command line tool.

Remember that the specific steps and commands might vary depending on the details of your workflow, software versions, and environment. Be sure to consult the documentation for miniwdl, WDL, and any other tools you're using for more advanced usage and troubleshooting.

Happy analyzing!