Duplicates
Data quality is very important for us. Therefore, our moderation team is investing additional time to determine false-positives and potential duplicates.
Identification & Handling
In rare occasions a duplicate is added to the database. As soon as we are aware of this, we initiate the following process:
- Identify parent entry, which is usually the first entry that got added to the database
- Merge the data of the new duplicate into the existing original
- Flag the duplicate as such and reference the original
Behavior of Entries
This leads to the following effect on the service which shall be demonstrated on VDB-243107 which was replaced by VDB-233216:
- The duplicate is hidden in most overview lists on the web site (e.g. recent, archive, search)
- Accessing the duplicate entry will enforce an HTTP redirect to the correct entry
- Accessing Diff and History Views of a duplicate remains possible
- Accessing the duplicate via API shows the obsolete duplicate entry data which contains the additional data field
entry_replacedby
(this is the indicator that this is a duplicate that got merged) - Accessing the correct entry shows the correct entry data which might also contain the data field
entry_replaces
for backlinking purposes
CVE Duplicates
If a CVE is a duplicate we approach such like this:
- If we are the responsible CNA, we will flag the duplicate CVE as such and revoke it
- If we are not the responsible CNA, we will add the duplicate CVE to our existing entry
Split Methodology
Please consider our splitting methodology which might affect duplicate handling as well.