PHBG workflows
Page Contents
The kSNP3 workflow is for phylogenetic analysis of bacterial genomes using single nucleotide polymorphisms (SNPs). The kSNP3 workflow identifies SNPs amongst a set of genome assemblies, then calculates a number of phylogenetic trees based on those SNPs:
_pan
._core
.This workflow also features an optional module, summarize_data
that creates a presence/absence matrix for the analyzed samples from a list of indicated columns (such as AMR genes, plasmid types etc.). If the phandango_coloring
variable is set to true
, this will be formatted for visualization in Phandango, else it can be viewed in Excel.
The ksnp3
workflow is run on the set of assembly files to produce both pan-genome and core-genome phylogenies. This also results in alignment files which - are used by snp-dists
to produce a pairwise SNP distance matrix for both the pan-genome and core-genomes.
The optional summarize_data
task performs the following only if all of the data_summary_*
and sample_names
optional variables are filled out:
"amrfinderplus_virulence_genes,amrfinderplus_stress_genes"
, etc. that can be found within the origin Terra data table.amrfinder_amr_genes
column for a sample contains these values: "aph(3')-IIIa,tet(O),blaOXA-193"
, the summarize_data
task will check each sample in the set to see if they also have those AMR genes detected.By default, this task appends a Phandango coloring tag to color all items from the same column the same; this can be turned off by setting the optional phandango_coloring
variable to false
.
✉️ [email protected] | X (formerly Twitter) | LinkedIn | 🌐 Website