Vulnerability deduplication process

Tier: Free, Premium, Ultimate
Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated

When a pipeline contains jobs that produce multiple security reports of the same type, it is possible that the same vulnerability finding is present in multiple reports. This duplication is common when different scanners are used to increase coverage, but can also exist in a single report. The deduplication process allows you to maximize the vulnerability scanning coverage while reducing the number of findings you need to manage.

A finding is considered a duplicate of another finding when their scan type, location, and one or more of its identifiers are the same.

The scan type must match because each can have its own definition for the location of a vulnerability. For example, static analyzers are able to locate a file path and line number, whereas a container scanning analyzer uses the image name instead.

When comparing identifiers, GitLab does not compare CWE and WASC during deduplication because they are "type identifiers" and are used to classify groups of vulnerabilities. Including these identifiers would result in many findings being incorrectly considered duplicates. Two findings are considered unique if none of their identifiers match.

In a set of duplicated findings, the first occurrence of a finding is kept and the remaining are skipped. Security reports are processed in alphabetical file path order, and findings are processed sequentially in the order they appear in a report.

Deduplication examples

Example 1: matching identifiers and location, mismatching scan type.
- Finding
  - Scan type: dependency_scanning
  - Location fingerprint: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
  - Identifiers: CVE-2022-25510
- Other Finding
  - Scan type: container_scanning
  - Location fingerprint: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
  - Identifiers: CVE-2022-25510
- Deduplication result: no deduplication occurs because the scan type is different.
Example 2: matching location and scan type, mismatching type identifiers.
- Finding
  - Scan type: sast
  - Location fingerprint: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
  - Identifiers: CWE-259
- Other Finding
  - Scan type: sast
  - Location fingerprint: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
  - Identifiers: CWE-798
- Deduplication result: no duplication occurs because CWE identifiers are ignored.
Example 3: matching scan type, location and an identifier.
- Finding
  - Scan type: container_scanning
  - Location fingerprint: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
  - Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259
- Other Finding
  - Scan type: container_scanning
  - Location fingerprint: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
  - Identifiers: CVE-2022-25510, CWE-798
- Deduplication result: duplication occurs because all criteria match, and type identifiers (CWE) are ignored. Only one identifier needs to match, in this case CVE-2022-25510.

You can find definitions for each scan type gitlab/lib/gitlab/ci/reports/security/locations and gitlab/ee/lib/gitlab/ci/reports/security/locations.

For instance, for container_scanning type the location is defined by the Docker image name without tag. However, if the image tag matches a semver syntax and doesn't look like a Git commit hash, it isn't considered a duplicate.

For example, the following locations are treated as duplicates:

registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3
registry.gitlab.com/group-name/project-name/image1:libcrypto3

However, the following locations are considered different:

registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3
registry.gitlab.com/group-name/project-name/image1:libcrypto3