Duplicates

Data quality is very important for us. Therefore, our moderation team is investing additional time to determine false-positives and potential duplicates.

Identification & Handling

In rare occasions a duplicate is added to the database. As soon as we are aware of this, we initiate the following process:

  1. Identify parent entry, which is usually the first entry that got added to the database
  2. Merge the data of the new duplicate into the existing original
  3. Flag the duplicate as such and reference the original

Behavior of Entries

This leads to the following effect on the service which shall be demonstrated on VDB-243107 which was replaced by VDB-233216:

  • The duplicate is hidden in most overview lists on the web site (e.g. recent, archive, search)
  • Accessing the duplicate entry will enforce an HTTP redirect to the correct entry
  • Accessing Diff and History Views of a duplicate remains possible
  • Accessing the duplicate via API shows the obsolete duplicate entry data which contains the additional data field entry_replacedby (this is the indicator that this is a duplicate that got merged)
  • Accessing the correct entry shows the correct entry data which might also contain the data field entry_replaces for backlinking purposes

CVE Duplicates

If a CVE is a duplicate we approach such like this:

  • If we are the responsible CNA, we will flag the duplicate CVE as such and revoke it
  • If we are not the responsible CNA, we will add the duplicate CVE to our existing entry

Split Methodology

Please consider our splitting methodology which might affect duplicate handling as well.

Do you know our Splunk app?

Download it now for free!