Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding code in strlen implementation

I have two questions regarding the implementation of strlen in string.h in glibc.

  1. The implementation uses a magic number with 'holes'. I am not able to understand how this works. Can someone please help me understand this snippet:

    size_t
    strlen (const char *str)
    {
       const char *char_ptr;
       const unsigned long int *longword_ptr;
       unsigned long int longword, himagic, lomagic;
    
       /* Handle the first few characters by reading one character at a time.
          Do this until CHAR_PTR is aligned on a longword boundary.  */
       for (char_ptr = str; ((unsigned long int) char_ptr
                 & (sizeof (longword) - 1)) != 0;
            ++char_ptr)
         if (*char_ptr == '\0')
           return char_ptr - str;
    
       /* All these elucidatory comments refer to 4-byte longwords,
          but the theory applies equally well to 8-byte longwords.  */
    
       longword_ptr = (unsigned long int *) char_ptr;
    
       /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
          the "holes."  Note that there is a hole just to the left of
          each byte, with an extra at the end:
    
          bits:  01111110 11111110 11111110 11111111
          bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
    
          The 1-bits make sure that carries propagate to the next 0-bit.
          The 0-bits provide holes for carries to fall into.  */
    
        himagic = 0x80808080L;
           lomagic = 0x01010101L;
           if (sizeof (longword) > 4)
           {
               /* 64-bit version of the magic.  */
               /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
               himagic = ((himagic << 16) << 16) | himagic;
                 lomagic = ((lomagic << 16) << 16) | lomagic;
             }
           if (sizeof (longword) > 8)
             abort ();
    
           /* Instead of the traditional loop which tests each character,
              we will test a longword at a time.  The tricky part is testing
              if *any of the four* bytes in the longword in question are zero.  */
           for (;;)
             {
               longword = *longword_ptr++;
    
               if (((longword - lomagic) & ~longword & himagic) != 0)
             {
               /* Which of the bytes was the zero?  If none of them were, it was
                  a misfire; continue the search.  */
    
               const char *cp = (const char *) (longword_ptr - 1);
    
               if (cp[0] == 0)
                 return cp - str;
               if (cp[1] == 0)
                 return cp - str + 1;
               if (cp[2] == 0)
                 return cp - str + 2;
               if (cp[3] == 0)
                 return cp - str + 3;
               if (sizeof (longword) > 4)
                 {
                   if (cp[4] == 0)
                 return cp - str + 4;
                   if (cp[5] == 0)
                 return cp - str + 5;
                   if (cp[6] == 0)
                 return cp - str + 6;
         if (cp[7] == 0)
          return cp - str + 7;
    }}}
    

    What is the magic number being used for?

  2. Why not simply increment pointer until NULL character and return count? Is this approach faster? Why is it so?

like image 452
Neo Avatar asked Jan 06 '16 21:01

Neo


1 Answers

This is used to look at 4 bytes (32 bits) or even 8 (64 bits) in one go, to check if one of them is zero (end of string), instead of checking each byte individually.

Here is one example to check for a null byte:

unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

For some more see Bit Twiddling Hacks.

The one used here (32-bit example):

There is yet a faster method — use hasless(v, 1), which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.

Looking at one byte at a time costs at least as much cpu cycles as looking at a full interger value (register wide). In this algorithm, full integers are checked to see if they contain a zero. If not, little instructions are used, and a jump can be made to the next full integer. If there is a zero byte inside, a further check is done to see at what exact position it was.

like image 86
Danny_ds Avatar answered Sep 22 '22 10:09

Danny_ds