Skip to content

stomp: emit content-length header for bodies containing embedded NUL#1158

Closed
Devansh-567 wants to merge 1 commit into
openSUSE:masterfrom
Devansh-567:fix/stomp-null-body-serialization
Closed

stomp: emit content-length header for bodies containing embedded NUL#1158
Devansh-567 wants to merge 1 commit into
openSUSE:masterfrom
Devansh-567:fix/stomp-null-body-serialization

Conversation

@Devansh-567

Copy link
Copy Markdown
Contributor

Description

This PR addresses an edge-case serialization bug in the STOMP message writer where frame bodies containing binary payloads or embedded null (\0) bytes cannot safely round-trip. It ensures that a content-length header is dynamically generated whenever the payload demands it under the STOMP specification.

Problem

The current implementation of write_message serializes headers exactly as they are populated inside the Message::headers map. However, according to the STOMP protocol specification, a content-length header is explicitly mandatory if the frame body contains embedded null bytes.

When a frame with an embedded \0 in the body is sent without a content-length header:

  1. write_message writes out the raw bytes along with the trailing global frame terminator.
  2. When the downstream parser (or our own read_message) receives this frame, it hits the fallback evaluation branch because has_content_length evaluates to false.
  3. The parser executes getline(is, msg.body, '\0'), which treats the very first embedded null byte inside the payload as the end-of-frame delimiter.

This results in silent data truncation, corrupting binary or multiplexed payloads during a serialization round-trip.

How I Found This

While tracing the data path within the stomp module to ensure end-to-end data integrity for non-text payloads, I compared the streaming constraints in the reader against the writer.

I observed that read_message relies completely on two distinct parsing modes depending on the presence of the content-length header: fixed-size block reads (is.read) vs. delimiter tokenization (getline splitting on \0).

Looking at write_message, there was no automated fallback or validation mechanism ensuring that the writer switches to or enforces a fixed-size footprint layout if the body contains embedded nulls. To confirm this mismatch, I simulated a test case passing a payload structured with internal null boundaries. As suspected, the writer serialized the entire string, but the reader prematurely cut off the data at the first null character, confirming the round-trip parsing breakdown.

Solution

I updated write_message to safeguard compliance with the STOMP specifications:

  1. Header Tracking: Added a boolean tracking flag (has_content_length) during the primary header serialization loop to check if the user manually defined a length constraint.
  2. Dynamic Injection: Introduced a fallback conditional right before writing the message body block. If content-length was not explicitly configured and msg.body.find('\0') != string::npos indicates an embedded null character, the encoder automatically calculates and emits the appropriate content-length:<size> header line into the outgoing sequence stream.

This change guarantees that payloads containing arbitrary binary data can round-trip flawlessly without breaking downstream parsers that rely strictly on standard STOMP delimiter rules.

If any cosmetic changes needed please do let me know :)

@aschnell

Copy link
Copy Markdown
Member

OK, the zypper plugin does not respond with messages containing NUL.

Apart from that: If the Stomp class does not trust the user to insert a content-length if required, why should it trust the user to insert a content-length with the correct value?

@Devansh-567

Copy link
Copy Markdown
Contributor Author

OK, the zypper plugin does not respond with messages containing NUL.

Apart from that: If the Stomp class does not trust the user to insert a content-length if required, why should it trust the user to insert a content-length with the correct value?

That is a completely fair point regarding API responsibility.

My reasoning for the fallback is the structural difference between a user providing explicit (but incorrect) metadata versus the library performing a silently destructive default serialization:

  1. Explicit User Settings: If a caller manually sets a content-length header, they are taking explicit control over the frame's metadata. If they pass an incorrect value, it behaves as a standard user-side logic bug.
  2. Implicit Library Defaults: When a caller omits the header, they are relying on the library's automated serialization. For standard text, the default delimiter-based framing works perfectly. However, if the body contains embedded \0, the default serialization mechanism becomes structurally invalid under the STOMP spec, leading to silent data truncation on read.

The goal was defensive masking: ensuring the library's automated serialization path remains protocol-compliant even if binary data is passed implicitly.

However, since the zypper plugin doesn't exchange payloads containing NUL bytes, I completely understand if this edge case falls out of scope for snapper's immediate requirements. Would you prefer to unconditionally enforce/recalculate content-length across the board, or should we leave header management entirely to the caller and close this out?

@aschnell

Copy link
Copy Markdown
Member

Yes, all the zypper plugin does is AFAIS reply with ACK - no header, no body. This handling look unnecessary to me.

Anyway thanks for the review.

@aschnell aschnell closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants