Duplicates

Data quality is very important for us. Therefore, our moderation team is investing additional time to determine false-positives and potential duplicates.

Identification & Handling

In rare occasions a duplicate is added to the database. As soon as we are aware of this, we initiate the following process:

  1. Identify parent entry, which is usually the first entry that got added to the database
  2. Merge the data of the new duplicate into the existing original
  3. Flag the duplicate as such and reference the original

Behavior of Entries

This leads to the following effect on the service:

  • The duplicate is hidden in all overview lists on the web site (e.g. recent, archive, search)
  • Accessing the duplicate will enforce an HTTP redirect to the original entry
  • Accessing the duplicate via API shows the obsolete duplicate entry data which contains the additional data field entry_replacedby (this is the indicator that this is a duplicate that got merged)
  • Accessing the original entry via API shows the correct entry data which might also contain the data field entry_replaces for backlinking purposes

CVE Duplicates

If a CVE is a duplicate we approach such like this:

  • If we are the responsible CNA, we will flag the duplicate CVE as such and revoke it
  • If we are not the responsible CNA, we will add the duplicate CVE to our existing entry

Split Methodology

Please consider our splitting methodology which might affect duplicate handling as well.

Do you know our Splunk app?

Download it now for free!