In our previous post What about the latency, stupid we showed how an erasure code is a valuable tool for any latency sensitive application (video conferencing, online gaming, etc.). In this post we will look at a popular family of erasure coding algorithms and show how choosing the wrong algorithm might increase latency instead of decreasing it.
The family of algorithms we will look at are called block codes. This is by far the most common type of erasure coding algorithm and therefore has the most widespread use. Algorithms such as Reed Solomon, LDPC, RaptorQ all fall into this category. Newer algorithms such as RLNC can operate not only as a block code, but also as a sliding window code (making it even more suitable for low latency applications - however we will save this discussion for a later post).
Before we dive in, let’s quickly recap the basic concept behind an erasure code. In a communication network an erasure code is used to protect a communication flow between two entities (see figure below). This is done by interleaving repair traffic with the normal traffic. If any data packets are lost along the way, the repair packets can be used to reconstruct the lost packets - without the need for retransmission.
How the repair traffic is generated is determined by the algorithms used within the encoder and decoder. Block codes work by collecting and grouping input packets into blocks, when a block is full a new block is generated. From each block repair packets can then be generated.
The impact on latency
So how does this affect latency? Fundamentally there are two properties of the code that directly impact latency:
- Property #1: The block size is essentially the amount of data which the algorithm operates over.
- Property #2: Whether the algorithm is capable of generating repair packets before the block is full.
Let’s first ignore property #2, and assume that all algorithms generate their redundancy once the block is full (which is the most common mode of operation - except for RLNC). As shown in the following figure choosing a large block size will increase the time between repair packet groups.
The consequence of this is that the potential latency induced by a packet loss increases. For example, say we lose the very first packet. In both cases we will repair this loss after receiving the first repair packet:
- In case #1 this happens when receiving packet 5
- In case #2 this happens when receiving packet 17
In practice the number of packets included in a single block vary - certain algorithms such as Reed Solomon perform best when the number of packets in a block is small whereas other algorithms such as LDPC and RaptorQ perform best when the block is large. RLNC can be tuned to operate well in both regimes.
The following plot visualizes how the worst case latency builds as a function of the block size given an 5 Mbit/s data stream and a packet size of 1280 bytes. In this example if we have more than 500 packets in our block, we need to wait at least one second before the first repair packet can be generated - this would be unacceptable for most latency sensitive applications.
Keep in mind that if the data rate was lower than 5 Mbit/s it would take even longer to fill the block and thus be able to generate the first repair packet.
Tradeoff between small and large blocks
However, going to very small blocks may not be optimal either. The tradeoff in this case is with the overall reliability of the stream. Since a specific block is only protected by repair packets generated from that block. In case #1 every block of four symbols is protected by 2 repair symbols. Whereas in case #2 all 16 symbols are protected by 8 repair symbols. Therefore we need to be very careful when selecting the block size, since:
- Selecting a very large block can result in large latency overhead - even more than what would be induced from a retransmission
- Selecting too small a block may be insufficient to protect against the required amount of packet loss.
New algorithms to the rescue
This is where property #2 comes in. If supported by the algorithm, we can produce a repair packet pattern yielding low latency, while getting the same protection for each source packet as with the larger block.
Of the algorithms mentioned in the introduction: Reed Solomon, LDPC, RaptorQ, and RLNC - only RLNC supports dynamically generating repair packets. Due to its flexible encoding algorithm even irregular repair packet patterns can be generated in response to fluctuating packet loss.
This makes RLNC an excellent erasure code for latency sensitive applications, because it can both introduce repair packets to minimize latency at the receiver and simultaneously improve reliability of packets across a large block of packets.
As an example you can see how RLNC is used to improve the user experience of a latency sensitive video game streaming service - you can find the video on YouTube here.
If you want to experiment with erasure codes for your use-case - or have a chat about how they could improve your users’ quality of experience get in touch at firstname.lastname@example.org and we will be happy to show you the light.