UHGG database for taxonomic profiling contains sequences with long unintentional stretches of N’s

The Unified Human Gastrointestinal Genome database, UHGG (v2.0.1), available from Download Curated Microbial Reference Database, contains 200+ genomes where by mistake all or large parts of the sequence consists of N’s. Both the sequence list and taxonomic profiling index are affected.

When using this database for Taxonomic Profiling, zero reads will map to these stretches of N’s. Consequently, the affected strains or species will go undetected by the analysis, or their abundance counts will be underestimated.

Examples of affected genomes are Escherichia coli, Listeria monocytogenes, Pseudomonas aeruginosa, Salmonella enterica, and Staphylococcus aureus.

Recommendation

If you have run Taxonomic Profiling on samples using the UHGG (v2.0.1) taxonomic profiling index, we recommend that you rerun your data once a corrected version of the UHGG database has been released. Alternatively, use one of the QMI-PTDB databases also available from Download Curated Microbial Reference Database.

How to check which reference index you used for taxonomic profiling results

Open the taxonomic profiling abundance table output and go to History View by clicking on the Show History icon under the View area. Select Taxonomic Profiling. The name of the applied reference index is listed under Parameters.  If this states UHGG (v2.0.1), your results are affected.

Affected database

  • This issue affects reference database UHGG (v2.0.1), both sequence list and taxonomic profiling index.
  • This issue was addressed in reference database UHGG (v2.0.1_1), both sequence list and taxonomic profiling index.
Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.