Page Contents
TheiaValidate performs basic comparisons between user-designated columns in two separate tables. We anticipate this workflow being run to determine if any differences exist between version releases or two workflows, such as TheiaProk_ONT vs TheiaProk_Illumina_PE. A summary PDF report is produced in addition to a Excel spreadsheet that lists the values for any columns that do not have matching content for a sample.
<aside> ⚠️ The two tables being compared must have both identical sample names and an equal number of samples. If not, validation will not work or (in the case of unequal number of samples) not be attempted.
</aside>
In order to enable this workflow to function for different workflow series, we require users to provide a list of columns they want to compare between the two tables. Feel free to use the information below that Theiagen uses to compare versions of the three main workflow series as a starting point for your own validations:
If additional validation metrics are desired, the user has the ability to provide a validation_criteria_tsv
file that specifies what type of comparison should be performed. There are several options for additional validation checks:
amrfinder_plus_genes
which is a comma-delimited list of genes) for identical content — order does not matter; that is, mdsA,mdsB
is determined to be same as mdsB,mdsA
. The EXACT match does not consider these to be the same, but the SET match does.Please note that all string inputs must be enclosed in quotes; for example, “column1,column2” or “workspace1”
The optional validation_criteria_tsv
file takes the following format (tab-delimited; a header line is required):
column_name criteria
columnB SET
columnC IGNORE
columnD 0.01
columnE EXACT
Please see the overview section for a description of all available criteria options (EXACT, IGNORE, SET, <PERCENT_DIFF>).
If the above inputs are provided, then the following output files will be generated:
✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website