I am trying to find information on glibc and to what extent it uses SSE functionality.
If it is optimized, can I use it out-of-the-box?
Say I am using one of the larger Linux distros, I assume that its glibc is compiled to be as generic as possible and to be as portable as possible, hence not optimized?
I am particular interested in the functions memcpy and memcmp and how to get these functions as fast as possible.
glibc 2.8 does not use SSE for memcpy or memcmp at all(in x86 or x86_64) - it uses some hand-written assembly which avoids anything not supported on all CPUs of the family. In glibc 2.10, a new type of relocation, STT_GNU_IFUNC will be supported, which will make better optimizations based on CPU support possible.
If you compile with the highest optimization settings, memcpy and memcmp might be replaced with intrinsics by the compiler and never call glibc at all. Then the mcpu and march compiler options will select the fastest code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With