Skip to content

CoMP v1.0 Feedback: Structured Provenance Metadata for Content Assets #10

@erik-sv

Description

@erik-sv

CoMP v1.0 includes provenance and provent fields on all four asset objects (Text, Video, Image, Audio). This is the right instinct - provenance is a necessary signal for AI systems acquiring licensed content. The current implementation, however, limits provenance to a binary flag and a domain string. This makes provenance declarative rather than verifiable, which undermines the commercial enforcement CoMP is designed to support.

This issue proposes expanding provenance from two fields to a small structured set that enables AI systems to verify provenance programmatically, and that gives content owners a mechanism to prove the authenticity of what they are selling.

The gap

The current fields:

Field Type Current behavior
provenance int (0/1) "Does provenance exist?"
provent string "Canonical domain of the entity providing the provenance (e.g. C2PA)"

Three problems:

1. No verification path. An AI system receives provenance: 1 but has no way to verify the claim. There is no manifest URL, no hash, no verification endpoint. The provenance signal is self-declared by the content owner or marketplace with no mechanism for the AI system to confirm it. This mirrors the citation verification gap raised in #6.

2. Standard and signer are conflated. The provent field description uses "C2PA" as its example, but C2PA is a standard, not an entity. The entity that creates a provenance record is the signer - a news organization, a stock photo provider, a publisher. An AI system evaluating content authenticity needs to know both what standard was used and who signed the record. These are distinct signals that serve different trust decisions.

3. No connection to rights. Provenance standards like C2PA support embedded rights assertions within signed manifests. The current provenance fields have no way to indicate whether the provenance record contains licensing or rights metadata. This is a missed link to the commercial terms layer that #7 proposed structuring at the Package level.

Proposed changes

Replace the binary with a type enumeration

Rather than "does provenance exist," signal what kind of provenance is available. This follows the pattern already established by sourcetype (Human / AI / Hybrid) and auth (None / api_key / oauth2 / SSL / Other).

List: Provenance Type

Value Label
0 None
1 C2PA manifest
2 Digital watermark
3 Content fingerprint
4 Other

The provenance field changes from int (0/1) to int (0-4), referencing this list. 0 retains backward compatibility with the current "no provenance" semantic.

Split provent into standard and signer

Field Type Description
provstd string Identifier of the provenance standard used (e.g., "c2pa", "iptc")
provsigner string Canonical domain of the entity that created the provenance record (e.g., "examplenews.com", "examplephotos.com")

Both are required when provenance > 0. The AI system uses provstd to select the correct verification method and provsigner to evaluate trust.

Add a verification URL

Field Type Description
provurl string URL where the AI system can retrieve or verify the provenance record for this asset

Optional. When present, the AI system can confirm the provenance claim before ingestion rather than trusting the declaration alone. This gives content owners a concrete enforcement mechanism: the provenance record is independently verifiable, and the AI system's access to that record creates an auditable chain.

Indicate embedded rights

Field Type Description
provrights int (0/1) Indicates whether the provenance record contains embedded rights or licensing assertions. 0 = No, 1 = Yes.

Optional. When provrights: 1, the AI system knows to inspect the provenance record for machine-readable licensing terms. This bridges the provenance layer to the commercial terms structure proposed in #7 - the Package defines the deal, and the provenance record carries cryptographic proof of the content owner's rights claim.

Updated example

The current Example 4 (Image Creation) uses "provenance": 1, "provent": "c2pa.org". With the proposed changes:

{
  "title": ["Sunset over Mediterranean Coast"],
  "url": "https://examplenews.com/assets/med-coast-001",
  "author": ["Maria Torres"],
  "sourcetype": 0,
  "provenance": 1,
  "provstd": "c2pa",
  "provsigner": "examplenews.com",
  "provurl": "https://examplenews.com/provenance/med-coast-001",
  "provrights": 1
}

The AI system can now:

  1. See that provenance exists and is a C2PA manifest (provenance: 1, provstd: "c2pa")
  2. Know who signed it (provsigner: "examplenews.com")
  3. Verify the claim before ingestion (provurl)
  4. Know that the manifest contains rights assertions (provrights: 1)

An asset without provenance remains unchanged: "provenance": 0 with all other provenance fields omitted.

Relationship to other feedback

  • Feedback on CoMP v1.0: Publisher/Response Object Side #7 (Package as rights wrapper): provrights connects provenance to the Package-level commercial terms. The Package defines what the deal is; the provenance record proves the content owner's authority to make the deal.
  • SPUR Coalition Comment - Reporting/Identity #6 (Verification and reporting): The SPUR Coalition noted that citation is a binary field with no verification path. The same structural gap exists in provenance. provurl provides the verification path that provenance: 1 alone cannot.
  • Gap in licensing protocol #5 (Machine-readable licensing): That issue identified the need for automated proof-of-permission. Provenance records with embedded rights assertions (provrights: 1) are one mechanism for delivering that proof in a way that is interoperable across standards.

Scope

This proposal adds four fields and one enumeration list. It does not introduce new objects, change the request/response flow, or expand CoMP's scope beyond content metadata. The ext object could carry these fields as an interim measure before formal inclusion.


We work on content provenance infrastructure and would be glad to contribute implementation guidance or work directly with the CoMP Working Group on this if it would be useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions