Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute sqrt(SIZE_MAX+1) using only integer constant expressions, catering for weird ABIs

OpenBSD's C library has an extension called reallocarray(3) which does realloc(array, size*nmemb) without blowing up if the multiplication overflows. The implementation contains this fragment:

/*
 * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
 * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
 */
#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

Over on Programmers.SE a modified version of that calculation got dinged for technical incorrectness. 4 should obviously be CHAR_BIT/2, but that's not the only problem. Suppose an unusual ABI in which size_t has padding bits. (This is not ridiculously implausible: consider a microcontroller with 32-bit registers but a 24-bit address space.) Then SIZE_MAX is less than 1 << (sizeof(size_t)*CHAR_BIT) [in infinite-precision arithmetic] and the calculation is wrong.

So, question: Can you compute floor(sqrt(SIZE_MAX+1)) using only C99 integer-constant-expression arithmetic, making no assumptions whatsoever about the ABI other than what C99 requires plus what you can learn from <limits.h>? Note that SIZE_MAX may equal UINTMAX_MAX, i.e. there may not be any type that can represent SIZE_MAX+1 without overflow.

EDIT: I think SIZE_MAX is required to be 2n − 1 for some positive integer n, but is not necessarily of the form 22n − 1 — consider S/390, one of whose ABIs has a 31-bit address space. Therefore: If sqrt(SIZE_MAX+1) is not an integer, the desired result (given how this constant is used) is floor() of the true value.

like image 583
zwol Avatar asked Sep 02 '14 13:09

zwol


2 Answers

The constant SIZE_MAX is non-negative and has type size_t. For shortness, I will define:

  #define S SIZE_MAX  

The mathematical value S+1 is or can be, as you pointed out, out of range for any integer type.
I will write S1 for the mathemtaical value of S+1.
If we consider the logarithm (in base 2, if you want) of S1, then we have:

 logarithm(sqrt(S1)) == (1.0/2.0) logarithm(S1)

On the other hand, in almost-for-sure every realistic situation, we will have that S is represented as a binary number having only 1 bits. The number b of this bits is, in general, the number CHAR_BIT multiplied by a power of two, multiplied by CHAR_BIT: 16, 32, 64, 128... I will denote the exponent of this power by p. Thus, for CHAR_BIT == 8, we have:

16 == CHAR_BIT * 2 ----> p == 1
32 == CHAR_BIT * 4 ----> p == 2
64 == CHAR_BIT * 8 ----> p == 3

Now we have:

logarithm(S1) == b == CHAR_BIT * (2 ** p)   (I am denoting with ** to the "power math. operator").

logarithm(sqrt(S1)) == logaritm(S1) / 2.0 == CHAR_BIT * (2 ** p) / 2.0 == CHAR_BIT * (2 ** (p - 1))

By assuming or knowing that every bit in size_t is used only to represent the bits of an integer number, we have this equality, por some (unknown) value of p:

sizeof(size_t) == b == CHAR_BIT * (2 ** p)

We can assume, for the state-of-the-art in 2014, that the value of p <= 5, say (you can grow this magic number 5 to bigger values in what follows).

Now, consider the following expression, intended to "search and found" the value of b, under the assumption that p <= 5:

#define S_1 ((size_t)1ULL)
#define b (sizeof(size_t))
#define bitexpr(p) ((size_t)(CHAR_BIT * (S_1 << (p))))
#define expr(p) ((size_t) (S_1 << (p)))
#define exp2_expr_1(p) ((size_t)(S_1 << bitexpr(p-1)))
// SRSM() stands for: Square Root SizeMax
#define SRSM  ( \
    (expr(1)==b)? exp2_expr_1(1) :             \
        (expr(2)==b)? exp2_expr_1(2) :         \
            (expr(3)==b)? exp2_expr_1(3) :     \
              (expr(4)==b)? exp2_expr_1(4) :   \
                (expr(5)==b)? exp2_expr_1(5) : \
                   (size_t)0  /* Error! */     \
    ) /* end-of-macro*/

The macro SRSM brings, actually, the square root of S+1, but I suppose you can figure out what to do with this number.

What is important here is that the square root of SIZE_MAX can be obtained by using purely integer constant expressions.

If you want, the "magic" number 5 can be changed by another one.

A more general approach, intended to solve an arbitrary situation, on any possible machine fitting the standard, it would be more complicated.
The method used in this post is independent of the value that has CHAR_BIT, but it uses that the number of bytes is a power of 2.

EDITED: I changed a little the method for "search", starting by 1 and then growing up, to avoid possible "false" matches with << operator and big numbers (one never knows...). Now, the first match is, for sure, the correct.

like image 176
pablo1977 Avatar answered Sep 19 '22 10:09

pablo1977


An alternative to solve the sqrt(SIZE_MAX+1) is to look at the higher level goal shown here and below.

/* s1*s2 <= SIZE_MAX if s1 < K and s2 < K, where K = sqrt(SIZE_MAX+1) */
const size_t MUL_NO_OVERFLOW = ((size_t)1) << (sizeof(size_t) * 4);
if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
    nmemb > 0 && SIZE_MAX / nmemb < size)
  abort();

A simple overflow detection could use the following. This unfortunately performs division - an expansive test, yet otherwise "works". No MUL_NO_OVERFLOW needed.

if (nmemb > 0 && (SIZE_MAX / nmemb < size))
  abort();

Instead of dividing, OP's code does 2 simple compares against MUL_NO_OVERFLOW to eliminate most contenders.

if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) && ...

Yet the pre-processor computation of MUL_NO_OVERFLOW is challenging , hence this post.

The alternative is to use an easy to compute MUL_NO_OVERFLOW_ALT to eliminate many size, nmemb pairs from the division test.

// good for up to SIZE_MAX + 1 == 2**128
#define SQN(x) ( !!(x) )
#define SQ0(x) ( (x) >= 0x2u ? 1 + SQN((x)/0x2u) : SQN(x) )
#define SQ1(x) ( (x) >= 0x4u ? 2 + SQ0((x)/0x4u) : SQ0(x) )
#define SQ2(x) ( (x) >= 0x10u ? 4 + SQ1((x)/0x10u) : SQ1(x) )
#define SQ3(x) ( (x) >= 0x100u ? 8 + SQ2((x)/0x100u) : SQ2(x) )
#define SQ4(x) ( (x) >= 0x10000u ? 16 + SQ3((x)/0x10000u) : SQ3(x) )
#define SQ5(x) ( (x) >= 0x100000000u ? 32 + SQ4((x)/0x100000000u) : SQ4(x) )
#define SQ6(x) ( (x)/0x100000000u >= 0x100000000u ? 64 + SQ5((x)/0x1000000000000000u/16) : SQ5(x) )
#define MUL_NO_OVERFLOW_ALT (((size_t)1) << (SQ6(SIZE_MAX)/2))

printf("%zx %zx\n", SIZE_MAX, MUL_NO_OVERFLOW_ALT);
// Output
// ffffffff 10000

In OP's case, even if the best (size_t)sqrt(SIZE_MAX + 1.0) was possible at compile time, the division test would still be needed to pass legitimate pairs of size, nmemb like 4, SIZE_MAX/100. So for this task, the best ISQRT(SIZE_MAX + 1.0) is not needed, just something near and not greater than.

With this code the best value is calculated when SIZE_MAX + 1 is an even power of 2 - a common case.

like image 43
chux - Reinstate Monica Avatar answered Sep 21 '22 10:09

chux - Reinstate Monica