Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance of log10 function returning an int

Today I needed a cheap log10 function, of which I only used the int part. Assuming the result is floored, so the log10 of 999 would be 2. Would it be beneficial writing a function myself? And if so, which way would be the best to go. Assuming the code would not be optimized.

The alternatives to log10 I've though of;

  • use a for loop dividing or multiplying by 10;
  • use a string parser(probably extremely expensive);
  • using an integer log2() function multiplying by a constant.

Thank you on beforehand:)

like image 584
laurisvr Avatar asked Sep 17 '14 13:09

laurisvr


1 Answers

The operation can be done in (fast) constant time on any architecture that has a count-leading-zeros or similar instruction (which is most architectures). Here's a C snippet I have sitting around to compute the number of digits in base ten, which is essentially the same task (assumes a gcc-like compiler and 32-bit int):

unsigned int baseTwoDigits(unsigned int x) {
    return x ? 32 - __builtin_clz(x) : 0;
}

static unsigned int baseTenDigits(unsigned int x) {
    static const unsigned char guess[33] = {
        0, 0, 0, 0, 1, 1, 1, 2, 2, 2,
        3, 3, 3, 3, 4, 4, 4, 5, 5, 5,
        6, 6, 6, 6, 7, 7, 7, 8, 8, 8,
        9, 9, 9
    };
    static const unsigned int tenToThe[] = {
        1, 10, 100, 1000, 10000, 100000, 
        1000000, 10000000, 100000000, 1000000000,
    };
    unsigned int digits = guess[baseTwoDigits(x)];
    return digits + (x >= tenToThe[digits]);
}

GCC and clang compile this down to ~10 instructions on x86. With care, one can make it faster still in assembly.

The key insight is to use the (extremely cheap) base-two logarithm to get a fast estimate of the base-ten logarithm; at that point we only need to compare against a single power of ten to decide if we need to adjust the guess. This is much more efficient than searching through multiple powers of ten to find the right one.

If the inputs are overwhelmingly biased to one- and two-digit numbers, a linear scan is sometimes faster; for all other input distributions, this implementation tends to win quite handily.

like image 149
Stephen Canon Avatar answered Nov 15 '22 20:11

Stephen Canon