Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast strlen with bit operations

I found this code

int strlen_my(const char *s)
{
    int len = 0;
    for(;;)
    {
        unsigned x = *(unsigned*)s;
        if((x & 0xFF) == 0) return len;
        if((x & 0xFF00) == 0) return len + 1;
        if((x & 0xFF0000) == 0) return len + 2;
        if((x & 0xFF000000) == 0) return len + 3;
        s += 4, len += 4;
    }
}

I'm very interested in knowing how it works. ¿Can anyone explain how it works?

like image 402
Kevin Avatar asked Sep 05 '15 23:09

Kevin


2 Answers

A bitwise AND with ones will retrieve the bit pattern from the other operand. Meaning, 10101 & 11111 = 10101. If the result of that bitwise AND is 0, then we know we know the other operand was 0. A result of 0 when ANDing a single byte with 0xFF (ones) will indicate a NULL byte.

The code itself checks each byte of the char array in four-byte partitions. NOTE: This code isn't portable; on another machine or compiler, an unsigned int could be more than 4 bytes. It would probably be better to use the uint32_t data type to ensure 32-bit unsigned integers.

The first thing to note is that on a little-endian machine, the bytes making up the character array will be read into an unsigned data type in reverse order; that is, if the four bytes at the current address are the bit pattern corresponding to abcd, then the unsigned variable will contain the bit pattern corresponding to dcba.

The second is that a hexadecimal number constant in C results in an int-sized number with the specified bytes at the little-end of the bit pattern. Meaning, 0xFF is actually 0x000000FF when compiling with 4-byte ints. 0xFF00 is 0x0000FF00. And so on.

So the program is basically looking for the NULL character in the four possible positions. If there is no NULL character in the current partition, it advances to the next four-byte slot.

Take the char array abcdef for an example. In C, string constants will always have null terminators at the end, so there's a 0x00 byte at the end of that string.

It'll work as follows:

Read "abcd" into unsigned int x:

x: 0x64636261 [ASCII representations for "dcba"]

Check each byte for a null terminator:

  0x64636261
& 0x000000FF
  0x00000061 != 0,

  0x64636261
& 0x0000FF00
  0x00006200 != 0,

And check the other two positions; there are no null terminators in this 4-byte partition, so advance to the next partition.

Read "ef" into unsigned int x:

x: 0xBF006665 [ASCII representations for "fe"]

Note the 0xBF byte; this is past the string's length, so we're reading in garbage from the runtime stack. It could be anything. On a machine that doesn't allow unaligned accesses, this will crash if the memory after the string is not 1-byte aligned. If there were just one character left in the string, we'd be reading two extra bytes, so the alignment of the memory adjacent to the char array would have to be 2-byte aligned.

Check each byte for a null terminator:

  0xBF006665
& 0x000000FF
  0x00000065 != 0,

  0xBF006665
& 0x0000FF00
  0x00006600 != 0,

  0xBF006665
& 0x00FF0000
  0x00000000 == 0 !!!

So we return len + 2; len was 4 since we incremented it once by 4, so we return 6, which is indeed the length of the string.

like image 59
Purag Avatar answered Sep 20 '22 12:09

Purag


Code "works" by attempting to read 4 bytes at a time by assuming the string is laid out and accessible like an array of int. Code reads the first int and then each byte in turn, testing if it is the null character. In theory, code working with int will run faster then 4 individualchar operations.

But there are problems:

Alignment is an issue: e.g. *(unsigned*)s may seg-fault.

Endian is an issue with if((x & 0xFF) == 0) might not get the byte at address s

s += 4 is a problem as sizeof(int) may differ from 4.

Array types may exceed int range, better to use size_t.


An attempt to right these difficulties.

#include <stddef.h>
#include <stdio.h>

static inline aligned_as_int(const char *s) {
  max_align_t mat; // C11
  uintptr_t i = (uintptr_t) s;
  return i % sizeof mat == 0;
}

size_t strlen_my(const char *s) {
  size_t len = 0;
  // align
  while (!aligned_as_int(s)) {
    if (*s == 0) return len;
    s++;
    len++;
  }
  for (;;) {
    unsigned x = *(unsigned*) s;
    #if UINT_MAX >> CHAR_BIT == UCHAR_MAX
      if(!(x & 0xFF) || !(x & 0xFF00)) break;
      s += 2, len += 2;
    #elif UINT_MAX >> CHAR_BIT*3 == UCHAR_MAX
      if (!(x & 0xFF) || !(x & 0xFF00) || !(x & 0xFF0000) || !(x & 0xFF000000)) break;
      s += 4, len += 4;
    #elif UINT_MAX >> CHAR_BIT*7 == UCHAR_MAX
      if (   !(x & 0xFF) || !(x & 0xFF00)
          || !(x & 0xFF0000) || !(x & 0xFF000000)
          || !(x & 0xFF00000000) || !(x & 0xFF0000000000)
          || !(x & 0xFF000000000000) || !(x & 0xFF00000000000000)) break;
      s += 8, len += 8;
    #else
      #error TBD code
    #endif
  }
  while (*s++) {
    len++;
  }
  return len;
}
like image 27
chux - Reinstate Monica Avatar answered Sep 18 '22 12:09

chux - Reinstate Monica