Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you quickly compute the integer logarithm for any base?

How can I quickly compute the integer logarithm for any base, not just base 10? This question has a really efficient solution for base 10 but I would like to understand how I can generalize that to other bases.

like image 874
Jan Schultke Avatar asked Aug 14 '20 10:08

Jan Schultke


1 Answers

Fundamentals

The logarithm base b of any number n log_b(n) can be computed using log(n) / log(b) where log is the logarithm with any base (usually the natural logarithm). log(b) is a constant, so if we can compute the logarithm for some base efficiently, we can compute the logarithm for any base efficiently.

Unfortunately, this conversion is only possible if we don't drop digits. For integers, we can only quickly compute the floored logarithm. For example, log_2(10) = 3. This would be the result for any number between 8 and 15, despite those having different amounts of decimal digits. So this binary logarithm can help us make a good guess, but we need to refine that guess.

Base 10

The aforementioned question has the following solution:

constexpr unsigned log2floor(uint64_t x) {
    // implementation for C++17 using clang or gcc
    return x ? 63 - __builtin_clzll(x) : 0;

    // implementation using the new C++20 <bit> header
    return x ? 63 - std::countl_zero(x) : 0;
}

constexpr unsigned log10floor(unsigned x) {
    constexpr unsigned char guesses[32] = {
        0, 0, 0, 0, 1, 1, 1, 2, 2, 2,
        3, 3, 3, 3, 4, 4, 4, 5, 5, 5,
        6, 6, 6, 6, 7, 7, 7, 8, 8, 8,
        9, 9
    };
    constexpr uint64_t powers[11] = {
        1, 10, 100, 1000, 10000, 100000, 1000000,
        10000000, 100000000, 1000000000, 10000000000
    };
    unsigned guess = guesses[log2floor(x)];
    return guess + (x >= powers[guess + 1]);
}

Note that I had to make some modifications because the solution isn't actually 100% correct.

As explained in the question, we make a guess based on the binary logarithm which can be computed very efficiently and then increment our guess if necessary.

The guess table can be computed using:

index -> log_10(exp(2, index)) = log_10(1 << index)

You can see that the table first has a 1 entry at index 4, because exp(2, 4) = 16 which has a floored log_10 of 1.

Example

Say we want to know the log_10(15):

  1. We compute log_2(15) = 3
  2. We lookup log_10(exp(2, 3)) = log_10(8) = 0. This is our initial guess.
  3. We lookup exp(10, guess + 1) = exp(10, 1) = 10.
  4. 15 >= 10, so our guess was too low and we return guess + 1 = 0 + 1 = 1 instead.

Generalization for any Base

To generalize this approach for any base, we must compute the lookup tables in a constexpr context. To compute the guess table, we need a naive logarithm implementation for any base first:

template <typename Uint>
constexpr Uint logFloor_naive(Uint val, unsigned base)  {
    Uint result = 0;
    while (val /= base) {
        ++result;
    }
    return result;
}

Now, we can compute the lookup tables:

#include <limits>
#include <array>

template <typename Uint, size_t BASE>
constexpr std::array<uint8_t, std::numeric_limits<Uint>::digits> makeGuessTable()
{
    decltype(makeGuessTable<Uint, BASE>()) result{};
    for (size_t i = 0; i < result.size(); ++i) {
        Uint pow2 = static_cast<Uint>(Uint{1} << i);
        result.data[i] = logFloor_naive(pow2, BASE);
    }
    return result;
}

// The maximum possible exponent for a given base that can still be represented
// by a given integer type.
// Example: maxExp<uint8_t, 10> = 2, because 10^2 is representable by an 8-bit unsigned
// integer but 10^3 isn't.
template <typename Uint, unsigned BASE>
constexpr Uint maxExp = logFloor_naive<Uint>(static_cast<Uint>(~Uint{0u}), BASE);

// the size of the table is maxPow<Uint, BASE> + 2 because we need to store the maximum power
// +1 because we need to contain it, we are dealing with a size, not an index
// +1 again because for narrow integers, we access guess+1
template <typename Uint, size_t BASE>
constexpr std::array<uint64_t, maxExp<Uint, BASE> + 2> makePowerTable()
{
    decltype(makePowerTable<Uint, BASE>()) result{};
    uint64_t x = 1;
    for (size_t i = 0; i < result.size(); ++i, x *= BASE) {
        result.data[i] = x;
    }
    return result;
}

Note that we need the maxExp templated constant to determine the size of the second lookup table. Finally, we can use the lookup tables to come up with the final function:

// If our base is a power of 2, we can convert between the
// logarithms of different bases without losing any precision.
constexpr bool isPow2or0(uint64_t val) {
    return (val & (val - 1)) == 0;
}

template <size_t BASE = 10, typename Uint>
constexpr Uint logFloor(Uint val) {
    if constexpr (isPow2or0(BASE)) {
        return log2floor(val) / log2floor(BASE);
    }
    else {
        constexpr auto guesses = makeGuessTable<Uint, BASE>();
        constexpr auto powers = makePowerTable<Uint, BASE>();

        uint8_t guess = guesses[log2floor(val)];
        
        // Accessing guess + 1 isn't always safe for 64-bit integers.
        // This is why we need this condition. See below for more details.
        if constexpr (sizeof(Uint) < sizeof(uint64_t)
            || guesses.back() + 2 < powers.size()) {
            return guess + (val >= powers[guess + 1]);
        }
        else {
            return guess + (val / BASE >= powers[guess]);
        }
    }
}

Notes on the powers Lookup Table

The reason why we always use uint64_t for our powers table is that we access guess + 1 and exp(10, guess + 1) isn't always representable. For example, if we are using 8-bit integers and have a guess of 2, then exp(10, guess + 1) would be 1000 which is not representable using an 8-bit integer.

Usually, this causes a problem for 64-bit integers, because there is no larger integer type available. But there are exceptions. For example, the largest power of 2 which is representable, exp(2, 63) is lower than exp(10, 19), which is the largest representable power of 10. This means that the highest guess will be 18 and exp(10, guess + 1) = exp(10, 19) is representable. Therefore, we can always safely access powers[guess + 1].

These exceptions are very useful because we can avoid an integer division in such cases. As seen above, exceptions like this can be detected with:

guesses.back() + 2 < powers.size()
like image 187
Jan Schultke Avatar answered Nov 17 '22 18:11

Jan Schultke