4 April, 2024
by Mark Teisman
Transmission Control Protocol (TCP), unlike the User Datagram Protocol (UDP), is a stream-based protocol. It provides a reliable, ordered and error-checked stream of bytes. This means that there are no explicit boundaries between the pieces of data being transmitted. In this post, I'll discuss some strategies for delimiting messages in TCP, so that the recipient is able to parse discrete messages.
With this protocol, a message (or command) is delimited by a specific sequence of bytes (e.g. \n
or some other unique string). The receiver continues to read from the stream until it encounters this delimiter, at which point it knows it can process the message.
A problem with this approach is that the chosen sequence of bytes may naturally occur in the "body" of the message. This problem can be mitigated by escaping such occurrence, for example by inserting a byte sequence (escape characters) before the delimiter. This is also called byte stuffing. The protocol then specifies that the receiver must remove these escape characters upon receipt.
For example SMTP (Simple Mail Transfer Protocol) uses delimiters and terminators to structure and manage the communication of email data over TCP. SMTP uses the combination of a Carriage Return (CR) and Line Feed (LF) - CRLF in short - as the line terminator in the SMTP protocol. The protocol uses the <COMMAND> <BODY><CRLF>
convention (e.g. HELO example.com\r\n
). The DATA
command (which is used to transfer the body of the email) is different, and is terminated by a <CRLF>
followed by a terminating dot (.
) and then another <CRLF>
. To prevent situations where a <CRLF>.<CRLF>
that naturally occurs in the body of an email would corrupt the message, SMTP uses dot-stuffing. When a line in the email content begins with a dot, the client adds an extra dot at the beginning of this line. This way, a single dot (.
) becomes two dots (..
), and the SMTP server does not misinterpret it as the end of the message. Naturally, when the server then reads a line with two dots, it has to remove one to reconstruct the original message. Check out IETF RFC 5321 (SMTP) if you're interested to learn more about the protocol.
With this protocol, messages are composed of a header and a body. The header (e.g. the first X bytes) holds data including the size of the body that follows. After parsing the header, the receiver knows that when it read the next X bytes, it can process the message. This protocol supports bodies of variable size (with a maximum of bytes equal to the body length the header can hold).
For example HTTP/2 uses this protocol. HTTP/2 communicates using frames, Each frame has a header which is 9 bytes in size. The first 3 bytes of the frame header specify the length of the frame payload in bytes. The subsequent bytes hold data about Type, Flags and Stream Identifier. Check out IETF RFC 7540 (HTTP/2) for more details.
In this protocol, there is a predefined contract between the sender and the receiver about the fixed size of messages, say X. The receiver will always read X bytes, and then process the message. If the actual message to be sent is Y bytes, and Y is smaller than X, then the producer will add padding to ensure it writes X bytes. This means there are inefficiencies when messages are of variable size. Also, because the body size is agreed upon outside of the TCP messages, there is no way to change the message size X without introducing a breaking change.
Certain data formats describe the beginning and end of their data structures. Think of the opening and closing braces {}
in the JSON format. If the contract between sender and receiver is that one top-level structure equals one message, then the receiver is able to identify message boundaries. The downside of this strategy is that there is a need for parsing the body as it's being read to know where the body ends.
This was it, a quick survey of common approaches on how message boundaries can be detected in protocols that build on streams-based protocols such as the TCP protocol.