Hardware Acceleration

cpus.jpg

Erasure correcting codes (ECC/FEC) provide elegant solutions to many networking problems. Whether you want to build reliable super low latency applications or efficiently send data to 1000s of devices simultaneously, ECC/FEC can provide compelling advantages. However, one common concern when integrating an ECC/FEC algorithm is often how big is the computational overhead?

The unsatisfying answer to this question is, as often the case in engineering, it depends. More specifically it depends on the configuration of the algorithm. E.g. how much data is processed by the ECC/FEC encoder/decoder, what is the specific type of algorithm being used, etc.

Although the configuration of the algorithm does have a significant impact on the performance, the quality of the implementation also needs to be taken into account. The key question is essentially - how efficient is the software implementation of the specific algorithm?

Things like avoiding unnecessary memory allocation and copying of data can make a big difference. Additionally it’s often possible to boost performance further by taking advantage of modern CPUs SIMD (Single instruction Multiple Data) instructions.

Speeding up computations using SIMD

SIMD can be used to significantly speed up the computations of an ECC/FEC algorithm by processing data in parallel.

These computations are fundamental to all ECC/FEC algorithms and SIMD acceleration is therefore not tied to a specific algorithm but only depends on whether the implementation supports it.

What is the impact of SIMD

To provide some sense of the impact of SIMD the following graphs show the raw throughput of a typical ECC/FEC operation on both ARM (Android) and x86 (Desktop) CPUs. We tested using the following configurations:

  • aarch64: 64bit ARM CPU without using acceleration

  • aarch64-neon: 64bit ARM CPU using NEON SIMD acceleration

  • x86: 64bit x66 CPU without using acceleration

  • x86-ssse3: 64bit ARM CPU using SSSE3 SIMD acceleration

  • x86-avx2: 64bit ARM CPU using AVX2 SIMD acceleration

image

The graph shows the boost in performance from the non accelerated version to the SIMD accelerated operation can be several 100s MB/s.

If we look at the relative gain of adding SIMD acceleration you can see that on ARM we can have roughly a 5x speed-up in the algorithm and on x86 we can have up to 16x speed-up!

image

Clearly we lose a significant amount of performance without SIMD acceleration. This can have a big impact when running the algorithms, e.g., on battery driven or resource constrained devices, or running in a shared environment like a cloud service where processes are competing for valuable CPU time and every minute counts.

All Steinwurf’s ECC/FEC algorithms utilize SIMD acceleration with run-time detection of the CPU capabilities. This means the same binary can run on multiple different CPUs and automatically utilize the fastest SIMD acceleration available.

To learn more about Steinwurf’s ECC/FEC algorithm or discuss how we can help improve your ECC/FEC performance feel free to reach out at contact@steinwurf.com.

Previous
Previous

Deploying Content Aware FEC Coding to deliver ultra-low latency video streaming

Next
Next

Multicast ECC/FEC