Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compare two strings (nul-terminated) other than doing it byte-by-byte?

Like strlen() from glibc that performs a nice bit manipulation and 4 bytes checked per time making the function so fast, compared to a byte-by-byte routine as most all others do, is there something like this to compare two strings in assembly? I'm reading some pages on code implementation for C language, very interested in strings-handling part, but I still not found none like this. I have to make this function as fast possible because it's the heart of my application.(don't recommend hash table)

Any assembler is welcome. But I'm a bit familiar with intel's assembly syntax, if assembly that you'll go to provide is different, please comment it.

like image 496
Jack Avatar asked Dec 04 '22 12:12

Jack


2 Answers

You can compare word by word (eg. 32-bits or 64-bits at a time). You just need to be careful not to go past the end of the string. If you are making the strings, then you could pad them with zeroes so they are a multiple of the word size, then you don't even need to check.

like image 179
Michael Day Avatar answered May 10 '23 19:05

Michael Day


Assuming zero terminated strings (although the same applies for memcmp()); the fastest way to do string comparisons in assembly depends on the length/s of the strings, and the specific CPU.

In general; SSE or AVX has a high setup cost but gives faster throughput once it's running, which makes it the best choice if you're comparing very long strings (especially if most of the characters match).

Alternatively, something that does one byte at a time using general purpose registers will typically have a very low setup cost and lower throughput, which makes it the best choice if you're comparing lots of small strings (or even lots of large strings where the first few characters are likely to be different).

If you're doing this for a specific application, then you can determine the average number of characters compared and find the best approach for that average. You can also have different functions for different cases - e.g. implement a strcmp_small() and a strcmp_large() if there's a mixture.

Despite all this, if the performance of string comparisons matters a lot, then it's very likely that the fastest way to compare strings is not comparing strings at all. Basically, the words "I have to make this function as fast possible because it's the heart of my application" should make everyone wonder why better ways of implementing the application aren't possible.

like image 37
Brendan Avatar answered May 10 '23 18:05

Brendan