How can I quickly compute the integer logarithm for any base, not just base 10? This question has a really efficient solution for base 10 but I would like to understand how I can generalize that to other bases.
The logarithm base b
of any number n
log_b(n)
can be computed using log(n) / log(b)
where log
is the logarithm with any base (usually the natural logarithm). log(b)
is a constant, so if we can compute the logarithm for some base efficiently, we can compute the logarithm for any base efficiently.
Unfortunately, this conversion is only possible if we don't drop digits. For integers, we can only quickly compute the floored logarithm. For example, log_2(10) = 3
. This would be the result for any number between 8 and 15, despite those having different amounts of decimal digits. So this binary logarithm can help us make a good guess, but we need to refine that guess.
The aforementioned question has the following solution:
constexpr unsigned log2floor(uint64_t x) {
// implementation for C++17 using clang or gcc
return x ? 63 - __builtin_clzll(x) : 0;
// implementation using the new C++20 <bit> header
return x ? 63 - std::countl_zero(x) : 0;
}
constexpr unsigned log10floor(unsigned x) {
constexpr unsigned char guesses[32] = {
0, 0, 0, 0, 1, 1, 1, 2, 2, 2,
3, 3, 3, 3, 4, 4, 4, 5, 5, 5,
6, 6, 6, 6, 7, 7, 7, 8, 8, 8,
9, 9
};
constexpr uint64_t powers[11] = {
1, 10, 100, 1000, 10000, 100000, 1000000,
10000000, 100000000, 1000000000, 10000000000
};
unsigned guess = guesses[log2floor(x)];
return guess + (x >= powers[guess + 1]);
}
Note that I had to make some modifications because the solution isn't actually 100% correct.
As explained in the question, we make a guess based on the binary logarithm which can be computed very efficiently and then increment our guess if necessary.
The guess table can be computed using:
index -> log_10(exp(2, index)) = log_10(1 << index)
You can see that the table first has a 1
entry at index 4
, because exp(2, 4) = 16
which has a floored log_10
of 1
.
Say we want to know the log_10(15)
:
log_2(15) = 3
log_10(exp(2, 3)) = log_10(8) = 0
. This is our initial guess.exp(10, guess + 1) = exp(10, 1) = 10
.15 >= 10
, so our guess was too low and we return guess + 1 = 0 + 1 = 1
instead.To generalize this approach for any base, we must compute the lookup tables in a constexpr
context. To compute the guess table, we need a naive logarithm implementation for any base first:
template <typename Uint>
constexpr Uint logFloor_naive(Uint val, unsigned base) {
Uint result = 0;
while (val /= base) {
++result;
}
return result;
}
Now, we can compute the lookup tables:
#include <limits>
#include <array>
template <typename Uint, size_t BASE>
constexpr std::array<uint8_t, std::numeric_limits<Uint>::digits> makeGuessTable()
{
decltype(makeGuessTable<Uint, BASE>()) result{};
for (size_t i = 0; i < result.size(); ++i) {
Uint pow2 = static_cast<Uint>(Uint{1} << i);
result.data[i] = logFloor_naive(pow2, BASE);
}
return result;
}
// The maximum possible exponent for a given base that can still be represented
// by a given integer type.
// Example: maxExp<uint8_t, 10> = 2, because 10^2 is representable by an 8-bit unsigned
// integer but 10^3 isn't.
template <typename Uint, unsigned BASE>
constexpr Uint maxExp = logFloor_naive<Uint>(static_cast<Uint>(~Uint{0u}), BASE);
// the size of the table is maxPow<Uint, BASE> + 2 because we need to store the maximum power
// +1 because we need to contain it, we are dealing with a size, not an index
// +1 again because for narrow integers, we access guess+1
template <typename Uint, size_t BASE>
constexpr std::array<uint64_t, maxExp<Uint, BASE> + 2> makePowerTable()
{
decltype(makePowerTable<Uint, BASE>()) result{};
uint64_t x = 1;
for (size_t i = 0; i < result.size(); ++i, x *= BASE) {
result.data[i] = x;
}
return result;
}
Note that we need the maxExp
templated constant to determine the size of the second lookup table. Finally, we can use the lookup tables to come up with the final function:
// If our base is a power of 2, we can convert between the
// logarithms of different bases without losing any precision.
constexpr bool isPow2or0(uint64_t val) {
return (val & (val - 1)) == 0;
}
template <size_t BASE = 10, typename Uint>
constexpr Uint logFloor(Uint val) {
if constexpr (isPow2or0(BASE)) {
return log2floor(val) / log2floor(BASE);
}
else {
constexpr auto guesses = makeGuessTable<Uint, BASE>();
constexpr auto powers = makePowerTable<Uint, BASE>();
uint8_t guess = guesses[log2floor(val)];
// Accessing guess + 1 isn't always safe for 64-bit integers.
// This is why we need this condition. See below for more details.
if constexpr (sizeof(Uint) < sizeof(uint64_t)
|| guesses.back() + 2 < powers.size()) {
return guess + (val >= powers[guess + 1]);
}
else {
return guess + (val / BASE >= powers[guess]);
}
}
}
powers
Lookup TableThe reason why we always use uint64_t
for our powers
table is that we access guess + 1
and exp(10, guess + 1)
isn't always representable. For example, if we are using 8-bit integers and have a guess of 2
, then exp(10, guess + 1)
would be 1000
which is not representable using an 8-bit integer.
Usually, this causes a problem for 64-bit integers, because there is no larger integer type available. But there are exceptions. For example, the largest power of 2 which is representable, exp(2, 63)
is lower than exp(10, 19)
, which is the largest representable power of 10. This means that the highest guess will be 18
and exp(10, guess + 1) = exp(10, 19)
is representable. Therefore, we can always safely access powers[guess + 1]
.
These exceptions are very useful because we can avoid an integer division in such cases. As seen above, exceptions like this can be detected with:
guesses.back() + 2 < powers.size()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With