GAMBIT Overview

<aside> 💡 GAMBIT determines the taxon of the query genome assembly using a k-mer based approach to match the assembly sequence to the closest complete genome in a database. If the distance between the query genome assembly and the closest genome in the database is within a built-in species threshold, GAMBIT will assign the query genome to that species. Species thresholds are determined through a combination of automated and manual curation processes based on the diversity within the taxon. For more details on how GAMBIT works, see the GAMBIT publication and GAMBIT software documentation.

</aside>

GAMBIT databases consist of two files:

  1. a signatures file containing the GAMBIT signatures (compressed representations) of all genomes represented in the database, which typically ends in the file ending “.gs”, and

  2. a metadata file relating the represented genomes to their genome accessions, taxonomic identifications, and species thresholds, which typically ends in the file ending “.gdb”

Because GAMBIT databases have built-in species thresholds, genomes included in each database version and the thresholds associated with each species are curated prior to release. Curation approaches may vary by GAMBIT database but aim to ensure that mislabeled genomes are removed and that species are non-overlapping. Please note that GAMBIT databases undergo curation and testing prior to release, but are limited by the availability and accuracy of sequencing data in public repositories.


GAMBIT Prokaryotic Databases

GAMBIT RefSeq Curated Database v1.0.0

GAMBIT RefSeq Curated Database v1.1.0

GAMBIT Curated Database v1.2.0

GAMBIT Curated Database v1.3.0

GAMBIT GTDB Database v2.0.0

GAMBIT Fungal Databases

GAMBIT Fungal Database v0.2.0