Vulnerability deduplication process
- Tier: Free, Premium, Ultimate
- Offering: GitLab.com, GitLab Self-Managed, GitLab Dedicated
When a pipeline contains jobs that produce multiple security reports of the same type, it is possible that the same vulnerability finding is present in multiple reports. This duplication is common when different scanners are used to increase coverage, but can also exist in a single report. The deduplication process allows you to maximize the vulnerability scanning coverage while reducing the number of findings you need to manage.
A finding is considered a duplicate of another finding when their scan type, location, and one or more of its identifiers are the same.
The scan type must match because each can have its own definition for the location of a vulnerability. For example, static analyzers are able to locate a file path and line number, whereas a container scanning analyzer uses the image name instead.
When comparing identifiers, GitLab does not compare CWE
and WASC
during deduplication because
they are "type identifiers" and are used to classify groups of vulnerabilities. Including these
identifiers would result in many findings being incorrectly considered duplicates. Two findings are
considered unique if none of their identifiers match.
In a set of duplicated findings, the first occurrence of a finding is kept and the remaining are skipped. Security reports are processed in alphabetical file path order, and findings are processed sequentially in the order they appear in a report.
Deduplication examples
- Example 1: matching identifiers and location, mismatching scan type.
- Finding
- Scan type:
dependency_scanning
- Location fingerprint:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
- Identifiers: CVE-2022-25510
- Scan type:
- Other Finding
- Scan type:
container_scanning
- Location fingerprint:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
- Identifiers: CVE-2022-25510
- Scan type:
- Deduplication result: no deduplication occurs because the scan type is different.
- Finding
- Example 2: matching location and scan type, mismatching type identifiers.
- Finding
- Scan type:
sast
- Location fingerprint:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
- Identifiers: CWE-259
- Scan type:
- Other Finding
- Scan type:
sast
- Location fingerprint:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
- Identifiers: CWE-798
- Scan type:
- Deduplication result: no duplication occurs because
CWE
identifiers are ignored.
- Finding
- Example 3: matching scan type, location and an identifier.
- Finding
- Scan type:
container_scanning
- Location fingerprint:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
- Identifiers: CVE-2019-12345, CVE-2022-25510, CWE-259
- Scan type:
- Other Finding
- Scan type:
container_scanning
- Location fingerprint:
adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
- Identifiers: CVE-2022-25510, CWE-798
- Scan type:
- Deduplication result: duplication occurs because all criteria match, and type identifiers (CWE) are ignored. Only one identifier needs to match, in this case CVE-2022-25510.
- Finding
You can find definitions for each scan type gitlab/lib/gitlab/ci/reports/security/locations
and gitlab/ee/lib/gitlab/ci/reports/security/locations
.
For instance, for container_scanning
type the location is defined by the Docker image name without
tag. However, if the image tag matches a semver syntax and doesn't look like a Git commit hash,
it isn't considered a duplicate.
For example, the following locations are treated as duplicates:
registry.gitlab.com/group-name/project-name/image1:12345019:libcrypto3
registry.gitlab.com/group-name/project-name/image1:libcrypto3
However, the following locations are considered different:
registry.gitlab.com/group-name/project-name/image1:v19202021:libcrypto3
registry.gitlab.com/group-name/project-name/image1:libcrypto3