TheiaCoV Genomic Characterization
The kSNP3 workflow is for phylogenetic analysis of bacterial genomes using single nucleotide polymorphisms (SNPs). The kSNP3 workflow identifies SNPs amongst a set of genome assemblies, then calculates a number of phylogenetic trees based on those SNPs:
_pan
._core
.This workflow also features an optional module, summarize_data
that creates a presence/absence matrix for the analyzed samples from a list of indicated columns (such as AMR genes, plasmid types etc.). If the phandango_coloring
variable is set to true
, this will be formatted for visualization in Phandango, else it can be viewed in Excel.
The ksnp3
workflow is run on the set of assembly files to produce both pan-genome and core-genome phylogenies. This also results in alignment files which - are used by snp-dists
to produce a pairwise SNP distance matrix for both the pan-genome and core-genomes.
If you fill out the data_summary_*
and sample_names
optional variables, you can use the optional summarize_data
task. The task takes a comma-separated list of column names from the Terra data table, which should each contain a list of comma-separated items. For example, "amrfinderplus_virulence_genes, amrfinderplus_stress_genes"
for these output columns from running TheiaProk. The task checks whether those comma-separated items are present in each row of the data table (sample), then creates a CSV file of these results. The CSV file indicates presence (TRUE) or absence (empty) for each item. By default, the task adds a Phandango coloring tag to group items from the same column, but you can turn this off by setting phandango_coloring
to false
.
✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website