I've profiled my application with Ants and found out that > 10% is in CRC32 calculations. (The CRC32-calculation is done in plain C#)
I did some googling and learned about the following intrinsics in Visual Studio 2008 :
_mm_crc32_u8
_mm_crc32_u16
_mm_crc32_u32
_mm_crc32_u64
( http://msdn.microsoft.com/en-us/library/bb514036.aspx )
Can anyone tell me / show me how to use these to replace my homebrew CRC32 ?
CRC32 calculations are getting faster over the years. Part because of implementation optimizations but also due to new processor instructions becoming available. Hence this new answer to almost a decade old question!
Stephan Brumme's CRC32 page has an overview of optimizations with the last one dated 2016. FastCRC by Yuri Babich is a 2019 C# implementation of the fast C++ CRC32 algorithm "Slicing-by-16" by Stephan Brumme & Bulat Ziganshin. He claims his version is just a little bit slower (about 10%) than the native CLI C++ fast CRC32 implementation. This algorithm is the older CRC-32-IEEE.
If you have the ability to choose another variant, go for CRC-32C (Castagnoli). This is available in the Crc32C.NET package.
The polynomial in CRC-32C was shown to have better error detection properties, which is the reason for its adoption in newer standards (iSCSI, SCTP, ext4). Aside from higher reliability, CRC-32C now has the advantage of dedicated instruction on newer Intel processors. That's why it is being chosen for high-performance applications, for example Snappy compression algorithm.
Crc32.NET is a .NET safe implementation of the above Crc32C.NET by Robert Važan but for the the Crc32 algorithm.
This library contains optimizations for managed code, so, it really is faster than other Crc32 implementations. If you need exactly Crc32, this library is the best choice. This implementation was investigated as fastest from different variants. Also, it is good for x64 and for x86, so, it seems, there is no sense to do 2 different realizations.
I have no idea which of the two .NET implementations above is the fastest for the classic CRC-32-IEEE algorithm. The performance comparison table does not reference the first implementation.
The answer from Anonymous Coward points to crcutil, a high performance CRC reference implementation of a novel Multiword CRC algorithm invented by Andrew Kadatch and Bob Jenkins in early 2007. The new algorithm is heavily tuned towards modern Intel and AMD processors and is substantially faster than almost all other software CRC algorithms. Their 2010 paper Everything we know about CRC but afraid to forget is listed in the downloads. This paper shows some tricks that can be used to avoid reprocessing certain data ranges:
So try to be smart about what needs calculating once the amount of data becomes large enough or when the environment is limited.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With