| Field | Value |
|---|---|
| RFD Submitter | Andrew Lilley Brinker |
| RFD Pull Request | RFD #0000 |
Today, CVE supports identifying affected products or packages using three "identifier-like" constructs, with one more proposed in RFD #2, "Supporting Package URLs in CVE". They are:
- CPE, Common Platform Enumeration
- Vendor and product names, provided as a pair
- Collection URL and package names, provided as a pair
- (If accepted) Package URLs, also called "purls"
While these coarse-grained identifiers are great for identifying affected products or packages, they are insufficiently granular for identifying affected artifacts. This makes it difficult for CNAs to report fine-grained applicability information when they otherwise could.
For example, a CNA may know that specific binaries they build and ship to users are affected by a vulnerability. Today, there is not a clear, structured mechanism for reporting identifiers for these affected binaries to CVE consumers.
This RFD proposes introducing support for reporting affected artifacts, by
adding a new optional affectedArtifacts field to containers.cna, which
would contain an array of objects specifying identifiers for artifacts affected
by a vulnerability.
While CVE records today can contain substantial information about affected products or packages, there isn't a clear and structured way to report information about specific artifacts affected or not affected by a vulnerability.
This deficiency means CNAs who publish artifacts—such as prebuilt binaries,
archive files such as .zips or .tar.gzs, script files, or configuration
files—lack a means to communicate when those artifacts are known to be
vulnerable or to not be vulnerable.
For vulnerability managers, reacting to vulnerability disclosures with
coarse-grained identifiers for affected software requires maintaining accurate
software inventories, whether through Software Bills of Material, package
manifests (such as package.json or Cargo.toml), lockfiles (such as
package-lock.json or Cargo.lock), or other means. Without some method for
tracking what software is deployed in a production system, vulnerability
managers may struggle to turn identifiers provided in a CVE record into a clear
determination of applicability, and therefore also to respond quickly to
vulnerabilities when they're disclosed. Reducing the time-to-react for
vulnerability managers is a clear equity for the CVE program.
The presence of artifact identifiers in CVE Records would provide an additional mechanism to vulnerability managers to identify applicable vulnerabilities. For example, a hash of a known-vulnerable binary could be searched for on production systems in addition to any deployed software inventories.
Artifact identifiers also have the benefit of low false-positive matches.
Coarse-grained identifiers for products or packages may be decomposed further
with additional fields for objects in the affected array, such as platforms,
versions, programFiles, programModules, and more. These fields, and the
potential for ambiguity or complexity for checking in many of them, mean that
coarse-grained identifiers' applicability decisions can easily become complex
and require human intervention to assess, and even remain uncertain despite
human intervention.
By comparison, identifiers for affected artifacts, which are often based on hashing file contents, are unlikely to produce false positives. The nature of cryptographic hashing algorithms is that they are generally resistant to engineering collisions, with properties such as collision resistance, preimage resistance, and second-preimage resistance. The result of these properties is that if a vulnerability manager finds a file in their system whose artifact identifier matches an artifact identifier provided in a CVE Record, that manager can act quickly with high confidence that the match is correct.
Artifact identifiers have an additional benefit, because of their low false positive rate and content-based construction, of being easy to automate and check at scale.
The following is the actual proposed change for the Record Format:
Add an affectedArtifacts field to the cnaPublishedContainer object, found
at the path containers.cna within a CVE Record. This new field would be an
array containing affectedArtifact objects. The specific edits to the schema
would be as follows:
First, the introduction of the affectedArtifacts field within the
cnaPublishedContainer object:
"affectedArtifacts": {
"type": "array",
"description": "List of affected artifacts.",
"minItems": 1,
"items": {"$ref": "#/definitions/affectedArtifact"}
}Second, the definition of the affectedArtifact type within the "definitions"
portion of the schema:
"affectedArtifact": {
"type": "object",
"description": "Provides information about a specific artifact affected by a vulnerability.",
"allOf": [
{
"description": "An identifier-like field, to identify the artifact.",
"anyOf": [
{"required": ["omniborArtifactID", "omniborArtifactType"]},
{"required": ["sha256"]}
]
},
{
"description": "The status of the artifact.",
"anyOf": [
{"required": ["status"]}
]
}
],
"properties": {
"omniborArtifactID": {
"type": "string",
"pattern": "^gitoid:blob:sha256:[0-9a-f]{64}$",
"description": "The OmniBOR Artifact ID of the artifact to be matched against.",
"examples": [
"gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772",
"gitoid:blob:sha256:230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61"
]
},
"omniborArtifactType": {
"type": "string",
"enum": ["artifact", "buildInput"],
"description": "Specifies how consumers of the Artifact ID should search for matches. If the 'target' is 'artifact', then the Artifact ID is identifying an artifact which should be searched for directly (for example, within a file system by matching against Artifact IDs for files). If the 'target' is 'buildInput' then the Artifact ID is identifying a build input, and consumers should match the Artifact ID against IDs found in OmniBOR Input Manifests for their software."
},
"sha256": {
"type": "string",
"pattern": "^[a-f0-9]{64}$",
"description": "The SHA-256 hash of the artifact.",
"examples": [
"68e656b251e67e8358bef8483ab0d51c6619f3e7a1a9f0e75838d41ff368f728",
"2cc620f8a156b986806bc2757c0572d978d8cbfc4d25f0dfa7c552291bf68279",
"97272dc1b6ac7ca84735b797b4a04233b17fd55707f9c728fc3747e3f935f02c"
]
},
"status": {
"description": "The vulnerability status for the version or range of versions. For a range, the status may be refined by the 'changes' list.",
"$ref": "#/definitions/status"
},
"version": {
"description": "The single version being described, or the version at the start of the range. By convention, typically 0 denotes the earliest possible version.",
"$ref": "#/definitions/version"
},
"versionType": {
"type": "string",
"description": "The version numbering system used for specifying the range. This defines the exact semantics of the comparison (less-than) operation on versions, which is required to understand the range itself. 'Custom' indicates that the version type is unspecified and should be avoided whenever possible. It is included primarily for use in conversion of older data files.",
"minLength": 1,
"maxLength": 128,
"examples": [
"custom",
"git",
"maven",
"python",
"rpm",
"semver"
]
},
"platforms": {
"description": "List of specific platforms if the vulnerability is only relevant in the context of these platforms (optional). Platforms may include execution environments, operating systems, virtualization technologies, hardware models, or computing architectures. The lack of this field implies that the other fields are applicable to all relevant platforms.",
"type": "array",
"minItems": 1,
"uniqueItems": true,
"items": {
"type": "string",
"examples": ["iOS", "Android", "Windows", "macOS", "x86", "ARM", "64 bit", "Big Endian", "iPad", "Chromebook", "Docker", "Model T"],
"maxLength": 1024
}
}
}
}The explanations of the fields for affectedArtifact objects is as follows:
omniborArtifactID: An OmniBOR Artifact Identifier, used to identify either an artifact itself, such as a binary file, or to identify build inputs used to produce the artifact.omniborArtifactType: The type associated with theomniborArtifactIDfield, can be either"artifact"or"buildInput". If"artifact"is used, then the field is the Artifact ID of an artifact itself, such as a binary file. If"buildInput"is used, then the field is the Artifact ID of a build input. This field indicates to CVE consumers how to use the field in question. For artifacts, they should search their systems and/or inventories for files with a matching Artifact ID. For build inputs, they should search their OmniBOR Input Manifests for IDs which match.sha256: The SHA-256 hash of the artifact in question.status: Indicates whether the identified artifact is affected, not affected, or has an unknown affected status.version: The version applicable to the identified artifact, if relevant.versionType: If"version"is used, this indicates what type of version is present, and should be used by CVE consumers to validate and interpret the"version"field.platforms: A list of platforms, describing the specific platform the identified artifact is intended for.
Additionally, the data constraints on the affectedArtifact object ensure that
at least one set of identifier-like fields is present per object, and that each
object always includes a "status" field.
Note that identifiers found in the same affectedArtifact object should be
interpreted as synonyms, identifying the same artifact. For example, an entry
in the affectedArtifacts array which contains both an omniborArtifactID
and sha256 value should be interpreted as identifying only one artifact,
for which either identifier is valid. The presence of multiple identifiers is
intended only to make matching easier for CVE consumers by providing them with
options which may be more convenient depending on what identifiers or tooling
the consumer has available in their systems to support matching.
This proposal is intended as a template for the introduction of more
fine-grained identifier types intended for identifying artifacts in the future.
Specifically, future identifiers should be added as new fields within the
affectedArtifact object inside the affectedArtifacts array.
To ensure consistency about new identifier types added, the CVE project should "vendor," meaning maintain its own public copy of, any relevant specifications when those specifications are not versioned upstream.
The following is an example affectedArtifacts field, identifying three
binaries, one for each of Windows, macOS, and Linux systems on x86:
"affectedArtifacts": [
{
"omniborArtifactID": "gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"omniborArtifactType": "artifact",
"sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"status": "affected",
"version": "0.18.1",
"versionType": "semver",
"platforms": ["macOS", "x86"]
},
{
"omniborArtifactID": "gitoid:blob:sha256:4043df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"omniborArtifactType": "artifact",
"sha256": "40414dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"status": "affected",
"version": "0.18.1",
"versionType": "semver",
"platforms": ["Windows", "x86"]
},
{
"omniborArtifactID": "gitoid:blob:sha256:ccc4df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"omniborArtifactType": "artifact",
"sha256": "ddd24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"status": "affected",
"version": "0.18.1",
"versionType": "semver",
"platforms": ["Linux", "x86"]
}
]The addition of this new field would enable CNAs to report affected artifacts, such as known-vulnerable prebuilt binaries shipped for versions of software affected by a vulnerability, and would be complementary to the existing ability in the Record Format to identify affected products and packages.
For CVE consumers, the addition of this field would provide the ability to search for the presence of known-vulnerable artifacts in their systems when reported by CNAs.
This would be a minor change, as the addition of new optional fields is considered non-breaking.
CVE consumers could, if they wanted, gain the benefit of the new field by
updating their consumption logic to recognize the field and make use of its
contents. CVE consumers would only be broken if they incorrectly assume in their
consumption logic that no new optional fields will ever be added to the
cnaContainer object.
The success of this proposal will depend on the adoption of the new field, and the degree to which the new field provides value for CVE consumers.
CNA adoption can be measured in reported CVEs. After a 6 month period from the publication of the first version to include the new field, the QWG must assess the prevalence of the new field in CVEs published in the past 6 months. If the new field is present in 5% of new CVEs, this RFD will be considered successful and the new field will not be rolled back.
CVE may consider making inclusion of affected artifacts a requirement for CNA recognition with the Enrichment Recognition List.
Measuring use by CVE consumers is a significantly larger challenge. A potential path would be to interview vulnerability management tool vendors, since many of these ingest and process the CVE list. Enquiring as to the role affected artifacts play in their processes would provide a strong indication of the value these identifiers provide. Of course, it will take vendors some time to adjust their processes. As such, the measure might be to look for at least two vendors using the new software identifier formats within a year of the adoption of the new formats.
Demand for OmniBOR was identified specifically in the most recent CVE user survey, with positive demand shown in Question 16, with the strongest demand shown from self-identified data aggregators and integrators.
More generally, demand for identifying affected artifacts in CVE is unclear. Beyond the question future priorities which included OmniBOR, there were no specific questions in the survey around demand for identifying affected artifacts.
That said, this lack of support has been identified as a gap in discussions among the QWG, and there is interest in addressing it, whether through this proposal or a future alternative proposal.
None identified.
Medium
There are no remaining unresolved questions.
More identifier types may be desirable to add in the future. Any question of what those types may be, or what they may look like within the CVE Record Format, is not addressed here.