Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to find first occurrence of byte in buffer

Tags:

c++

Disclaimer

  • I have read What is the fastest substring search algorithm?, which may be suboptimal for the single character case.
  • strchr requires a NUL-terminated string

I am looking for the fastest way to identify the first occurrence of a given byte in a byte buffer.

This is reminiscent of looking for the first occurrence of a character in a string except that:

  • the byte buffer is not NUL-terminated, instead I have an explicit length (and possibly embedded NUL characters)
  • the byte buffer is not allocated in a string or vector, I am only handed down a slice (aka, pointer & length)

The basic solution is:

size_t search(char const* buffer, size_t length, char c) {
    return std::find(buffer, buffer + length, c) - buffer;
}

However, a quick round-trip with the Godbolt compiler (-O2 -msse2 -mavx) does not show any hint of a vectorized instruction, only some unrolling, so I am wondering whether this is the optimal.

Is there a faster way to find the first occurrence of a given byte in a buffer?

Note: only the first occurrence matters.

Note: I am exclusively concerned with modern x86_64 CPUs on Linux, though I encourage answers to be as generic as possible and mention assumptions clearly.

like image 635
Matthieu M. Avatar asked Nov 16 '16 13:11

Matthieu M.


1 Answers

you can use memchr , which is usually implemented as an intrinsic function, and is usually (from my experience) much faster than any hand-rolled loop.

http://en.cppreference.com/w/c/string/byte/memchr

Edit: at least on VC++ (and I bet on GCC as well, I haven't checked), std::find will use memchr anyway if you look for a byte , so I would check if memchr actually make the program run faster.

like image 140
David Haim Avatar answered Sep 22 '22 02:09

David Haim