Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing integers as fast as possible

I'm trying to optimize parsing strings to uint64 values in C. At the moment I use a naive solution:

uint64_t parse(const char *source)
{
   uint64_t res = 0;
   while (source[0] >= '0' && source[0] <= '9') {
       res = res * 10 + (source[0] - '0');
       ++source;
   }
   return res;
}

However, this is not that fast. My first optimization was actually to replace isdigit(c) by a simple comparison (which is way faster due to the implementation of isdigit). Unfortunately, that was as well my last optimization.

There are some posts which describe some bit magic tricks, but they always seem to assume fix sized integers: https://kholdstare.github.io/technical/2020/05/26/faster-integer-parsing.html

Are there any tricks to make integer parsing as well faster for the case that the integer has an unknown length?

Thanks

like image 860
Kevin Meier Avatar asked Oct 31 '25 08:10

Kevin Meier


1 Answers

Tested three different implementations:

Manual unroll is slightly faster than the OP's loop version.

#define GET1(result, c) do {result *= 10; if((c) >= '0' && (c) <= '9')  
result += (c) - '0'; else return result;} while(0)
  

uint64_t parse2(const char *source)
{
    uint64_t result = 0;
    GET1(result, source[0 ]);
    GET1(result, source[1 ]);
    GET1(result, source[2 ]);
    GET1(result, source[3 ]);
    GET1(result, source[4 ]);
    GET1(result, source[5 ]);
    GET1(result, source[6 ]);
    GET1(result, source[7 ]);
    GET1(result, source[8 ]);
    GET1(result, source[9 ]);
    GET1(result, source[10]);
    GET1(result, source[11]);
    GET1(result, source[12]);
    GET1(result, source[13]);
    GET1(result, source[14]);
    GET1(result, source[15]);
    GET1(result, source[16]);
    GET1(result, source[17]);
    GET1(result, source[18]);
    GET1(result, source[19]);
    return result;
}


#define GET(result, c)          \
      result *= 10;switch(c)                            \
   {                                    \
        case '0' ... '9':               \
            result += (c) - '0';        \
            break;                      \
        case 0:                         \
        default:                        \
            return result;              \
   }                                    


uint64_t parse1(const char *source)
{
    uint64_t result = 0;
    GET(result, source[0 ]);
    GET(result, source[1 ]);
    GET(result, source[2 ]);
    GET(result, source[3 ]);
    GET(result, source[4 ]);
    GET(result, source[5 ]);
    GET(result, source[6 ]);
    GET(result, source[7 ]);
    GET(result, source[8 ]);
    GET(result, source[9 ]);
    GET(result, source[10]);
    GET(result, source[11]);
    GET(result, source[12]);
    GET(result, source[13]);
    GET(result, source[14]);
    GET(result, source[15]);
    GET(result, source[16]);
    GET(result, source[17]);
    GET(result, source[18]);
    GET(result, source[19]);
    return result;
}

Results (1000000000 iterations - 20char long number):

OPs parse: 14.26654600 seconds 
parse2   : 13.39631300 seconds 
parse1   : 12.98047100 seconds
like image 145
0___________ Avatar answered Nov 02 '25 23:11

0___________



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!