ClinVar from the June 22, 2022 update of CLC reference data is incomplete

Issue description

The ClinVar 20220528 reference data elements provided within the QIAGEN Sets "hg19 (Ensembl), version 3" and "hg38 (Ensembl) version 3" (Figure 1) are incomplete, containing only 198 non-reference alleles. More than 1.4 million non-reference alleles should have been present.

Figure 1. The reference data set hg19 (Ensembl) version 3, where ClinVar is incomplete, as seen in the Reference Data Manager.

If you have downloaded affected reference data sets and have run analyses that include an annotation step making use of Clinvar, relevant variant annotations may be missing in your results.
The affected data elements were released on June 21, 2022, and new, complete elements were released on August 19, 2022. Thus, only hg19 or hg38 Ensembl reference data sets, or Clinvar elements, downloaded between these dates using the Workbench Reference Data Manager are affected.
Some Template Workflows delivered with the Biomedical Genomics Analysis 22.1 plugin, released on June 28, 2022, are pre-configured to use the affected hg19 reference data. Details of the workflows affected are provided below, along with recommended actions to take.

Data and workflows affected

Affected Reference Data

  • clinvar_20220528_hg19 in the reference data set "hg19 (Ensembl)" version 3.
  • clinvar_20220528_hg38_no_alt_analysis_set in the reference data set "hg38 (Ensembl)" version 3.

ClinVar databases distributed with other versions of "hg19 (Ensembl)" and "hg38 (Ensembl)" reference data sets are not affected.
The reference data sets were available for download between June 21, 2022, and August 19, 2022, using Genomics Workbench 22.0 and newer. Earlier versions of Genomics Workbench are not affected.

Affected Workflows

The following Template Workflows provided by the Biomedical Genomics Analysis 22.1 plugin are configured to use the affected reference data:

  • Annotate Variants (TAS)
  • Annotate Variants (WES)
  • Annotate Variants (WGS)
  • Annotate Variants (WTS)
  • Compare Variants in DNA and RNA
  • Filter Causal Variants (TAS-HD)
  • Filter Causal Variants (WES-HD)
  • Filter Causal Variants (WGS-HD)
  • Filter Somatic Variants (TAS)
  • Filter Somatic Variants (WES)
  • Filter Somatic Variants (WGS)
  • Identify and Annotate Variants (TAS)
  • Identify and Annotate Variants (TAS-HD)
  • Identify and Annotate Variants (WES)
  • Identify and Annotate Variants (WES-HD)
  • Identify Candidate Variants and Genes from Tumor Normal Pair
  • Identify Variants and Add Expression Values
  • Identify Causal Inherited Variants in Family of Four (TAS) (legacy)
  • Identify Causal Inherited Variants in Family of Four (WES) (legacy)
  • Identify Causal Inherited Variants in Family of Four (WGS) (legacy)
  • Identify Causal Inherited Variants in Trio (TAS) (legacy)
  • Identify Causal Inherited Variants in Trio (WES) (legacy)
  • Identify Causal Inherited Variants in Trio (WGS) (legacy)
  • Identify Rare Disease Causing Mutations in Family of Four (TAS) (legacy)
  • Identify Rare Disease Causing Mutations in Family of Four (WES) (legacy)
  • Identify Rare Disease Causing Mutations in Family of Four (WGS) (legacy)
  • Identify Rare Disease Causing Mutations in Trio (TAS) (legacy)
  • Identify Rare Disease Causing Mutations in Trio (WES) (legacy)
  • Identify Rare Disease Causing Mutations in Trio (WGS) (legacy)

Recommendations

The ClinVar reference data element

If you have specified an affected Clinvar data element manually for an analysis, please download an updated data element and re-run your analyses.
Go to References in the top right corner of the CLC Genomics Workbench. Choose QIAGEN Sets and under Reference Data Elements, select a ClinVar reference data element and choose Download. Note that only complete versions of ClinVar are available for download under Reference Data Elements.
If the affected ClinVar reference data elements have been downloaded, they should be deleted. Go to References in the top right corner of the CLC Genomics Workbench. Choose QIAGEN Sets and under Previous Reference Data Elements Sets, select Clinvar 20220528_hg19 or Clinvar 20220528_hg38_no_alt_analysis_set and choose Delete (Figure 2).

Figure 2. Delete downloaded ClinVar 20220528 reference data elements via the Reference Data Manager.

ClinVar used in workflows as part of a reference data set

After a reference data update on August 19, 2022, it is no longer possible to select the affected reference data sets when launching the affected Template Workflows.
However, if using Biomedical Genomics Analysis plugin 22.1, and the affected hg19 Ensembl reference set has been downloaded, and an affected Template Workflow is launched with the option “Use the default reference data” selected, the incomplete clinvar_20220528_hg19 reference data element will be used to annotate variants unless the action described below is taken.

If you have the Biomedical Genomics Analysis 22.1 plugin installed and have downloaded the affected reference data for hg19

Delete the affected ClinVar reference data element following the steps described above. If the incomplete ClinVar element is not deleted, by default, affected Template Workflows will suggest “Use the default reference data” when launched (Figure 3). If checking this option, the workflow will proceed using the incomplete ClinVar database.
If you choose not to delete the incomplete ClinVar reference data element, you can avoid using it by checking “Select a reference set to use” and choosing one of the reference data sets under QIAGEN Active or QIAGEN previous (Figure 3).

Figure 3. When clinvar_20220528_19 has been downloaded, affected workflows will suggest “Use the default reference data” by default (red box). Instead choose “Select a reference set to use” to specify one of the reference data sets, which all contain complete ClinVar databases (green box).

Other situations

  • If you have the Biomedical Genomics Analysis 22.1 plugin installed and have downloaded the affected reference data for hg38, you will not be able to run the affected Template Workflows using default reference data (Figure 4). Instead, please choose one of the reference data sets under QIAGEN Active or QIAGEN previous.
  • If you have the Biomedical Genomics Analysis plugin 22.1 installed and have NOT downloaded the affected reference data, you will not be able to run the affected Template Workflows using default reference data (Figure 4). Instead, please choose one of the reference data sets under QIAGEN Active or QIAGEN previous.

Figure 4. Affected Template Workflows using the ClinVar 20220528 database can no longer be run using default reference data.

  • If you have any other version of the Biomedical Genomics Analysis plugin than 22.1 installed, the affected reference data sets are not available when you launch an affected Template Workflow. Choose any of the available reference data sets.
Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.