Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C algorithm for calculating norm/inner product

Tags:

c

algorithm

math

I have a need to check if a point in R^2 lies in a circle of relatively large radius r (up to 10^5). Obviously I would normally just compare the inner product to r^2, but this is in an embedded environment and this isn't going to work on int32_t values that are large enough since the quadratures will overflow the type (max 32 bit types).

Possible solutions:

I could manually kludge a 64 bit product out of two 32 bit ints (probably what I'll end up doing).

I could divide everything by 10 (or any value) then do the usual inner product comparison, but I loose precision.

I could try to check inside an n-gon inscribed in the circle, but that is a lot of calculation, tables, etc. and I still loose precision.

Is there an algorithm that is typically used for this sort of thing?

like image 784
TrivialCase Avatar asked Jan 09 '21 22:01

TrivialCase


People also ask

How to calculate inner product of two ranges in C++?

(since C++20) until reaching last1. For built-in meaning of + and *, this computes inner product of the two ranges. 2)Initializes the accumulator accwith the initial value initand then modifies it with the expression acc =op1(acc, op2(*first1, *first2)), then modifies again with the expression acc =op1(acc, op2(*(first1+1), *(first2+1))), etc

How to find the norm of a vector in C++?

Computing the Norm of a Vector - C++ Cookbook [Book] 11.11. Computing the Norm of a Vector You want to find the norm (i.e., the length) of a numerical vector. You can use the inner_product function from the <numeric> header to multiply a vector with itself as shown in Example 11-21. Example 11-21. Computing the norm of a vector

How to find the inner product of vectors and matrices in NumPy?

To find the inner product of the vectors and matrices, we can use the inner () method of NumPy. The outer product of the vectors and matrices can be found using the outer () method of NumPy.

Is every inner product space a normed space?

Hencek kis a norm as claimed. Thus every inner product space is a normed space, and hence also a metric space. If an inner productspace is complete with respect to the distance metric induced by its inner product, it is said to beaHilbert space. 4.3 Orthonormality


2 Answers

I'm afraid computing the 64-bit results is the simplest solution. Check if your compiler can generate efficient inline code for this:

int check_distance(int x, int y, int r) {
    return (long long)x * x + (long long)y * y <= (long long)r * r;
}

If the generated code seems too slow, you can add a test to check if 64-bit operation is required. Assuming x, y and r are positive, here is a solution using unsigned arithmetics and exact width types from <stdint.h>:

int check_distance(uint32_t x, uint32_t y, uint32_t r) {
    if (x <= 46340 && y <= 46340 && r <= 0xffff) {
        /* 32-bit unsigned expression does not overflow */
        return x * x + y * y <= r * r;
    } else {
        return (uint64_t)x * x + (uint64_t)y * y <= (uint64_t)r * r;
    }
}

Notice the constant 46340 which is floor(sqrt(pow(2, 31))): If both x and y are greater than this value, x*x + y*y will exceed 232.

Here is an alternative with a quicker test, but that will fall back to 64-bit operation for slightly smaller values:

int check_distance(uint32_t x, uint32_t y, uint32_t r) {
    if ((x | y | r) <= 0x7fff) {
        /* 32-bit unsigned expression does not overflow */
        return x * x + y * y <= r * r;
    } else {
        return (uint64_t)x * x + (uint64_t)y * y <= (uint64_t)r * r;
    }
}

Then if you really don't want to use the compiler's 64-bit arithmetics, you can write the computation explicitly. Considering the range of x, y and r specified as <= 100000, shifting the values right 2 bits keeps x and y below 46340:

int check_distance(uint32_t x, uint32_t y, uint32_t r) {
    if (x <= 46340 && y1 <= 46340 && r1 <= 0xffff) {
        /* 32-bit unsigned expression does not overflow */
        return x * x + y * y <= r * r;
    } else {
        /* shift all values right 2 bits to keep them below 46340 */
        uint32_t x0 = x & 3;
        uint32_t y0 = y & 3;
        uint32_t r0 = r & 3;
        uint32_t x1 = x >> 2;
        uint32_t y1 = y >> 2;
        uint32_t r1 = r >> 2;
        uint32_t x2_lo = x0 * (x0 + x1 * 8);
        uint32_t y2_lo = y0 * (y0 + y1 * 8);
        uint32_t d2_lo = x2_lo + y2_lo;
        uint32_t d2_hi = x1 * x1 + y1 * y1 + (d2_lo >> 4);
        uint32_t r2_lo = r0 * (r0 + r1 * 8);
        uint32_t r2_hi = r1 * r1 + (r2_lo >> 4);
        return d2_hi < r2_hi || (d2_hi == r2_hi && (d2_lo & 15) <= (r2_lo & 15));
    }
}

Finally, shifting values by 5 bits allows for numbers up to 1000000:

int check_distance(uint32_t x, uint32_t y, uint32_t r) {
    if (x <= 46340 && y1 <= 46340 && r1 <= 0xffff) {
        /* 32-bit unsigned expression does not overflow */
        return x * x + y * y <= r * r;
    } else {
        /* shift all values right 5 bits to keep them below 46340 */
        uint32_t x0 = x & 31;
        uint32_t y0 = y & 31;
        uint32_t r0 = r & 31;
        uint32_t x1 = x >> 5;
        uint32_t y1 = y >> 5;
        uint32_t r1 = r >> 5;
        uint32_t x2_lo = x0 * (x0 + x1 * 64);
        uint32_t y2_lo = y0 * (y0 + y1 * 64);
        uint32_t d2_lo = x2_lo + y2_lo;
        uint32_t d2_hi = x1 * x1 + y1 * y1 + (d2_lo >> 10);
        uint32_t r2_lo = r0 * (r0 + r1 * 64);
        uint32_t r2_hi = r1 * r1 + (r2_lo >> 10);
        return d2_hi < r2_hi || (d2_hi == r2_hi && (d2_lo & 1023) <= (r2_lo & 1023));
    }
}

All of the above versions produce exact results for the specified ranges. If you do not require exact result, you can just shift the values to bring them within the proper range and perform the 32-bit computation:

int check_distance(uint32_t x, uint32_t y, uint32_t r) {
    while (x > 46340 || y > 46340 || r > 0xffff) {
        x >>= 1;
        y >>= 1;
        r >>= 1;
    }
    /* 32-bit unsigned expression no longer overflows */
    return x * x + y * y <= r * r;
}
like image 162
chqrlie Avatar answered Oct 23 '22 00:10

chqrlie


The maximum radius you can have, using the premises you have described is 100000, which leads to a squared modulus of 10 000 000 000, which requires (as all numbers are positive) 35 bit integer (of unsigned square radius) to be represented.

Based on these premises, and the fact that you have no easy way to get 64bit integers, and having quite low extra bits, we can scale the results four bits in norm (two bits in the source coordinates) to achieve full capacity to handle upto 100000 coordinates in a 32bit unsigned integer.

In my first edition of this answer, I assumed using only one shift in the coordinates was enough to handle the full set of values (two bits in the calculated norm), and losing 1 bit of precision was considered, but I was wrong and one extra bit was needed. It is needed to shift the results at least three bits to hold the full set of inputs, so I decided to scale the coordinates two bits, and so the results would be scaled by four. As I decided also to always scale, and return the fraction of a square unit as an integer ranging from 0 to 15 (in sixteenths of a square unit). So you will achieve exact results by comparing first the integer parts of the two points and use the fractional parts, in case the integer parts match. This makes the computation and the meaning of the results returned back more coherent than earlier, and gives you complete exactitude with integer coordinates.

You requested a working implementation, so I have posted one for you below:

#include <stdio.h>
#include <stdint.h>

/* calculate the square of a divided by four number and
 * accumulate the fraction (in sixteenths of a square unit)
 * into the reference pointed by frac_p.  */
uint32_t
square_of_div16(uint32_t x, int *frac_p)
{
    /* we use (IP + FP)^2 = IP^2 + 2*IP*FP + FP^2 */

    uint32_t int_part    = x >>  2;                /* divide by four */
    uint32_t frac_part   = x & 0x3;                /* mod 4 */
    uint32_t int_result  = int_part  * int_part;   /* square of IP */
    int      frac_result = frac_part * frac_part;  /* square of FP */
    uint32_t mixed_prod  = int_part  * frac_part;  /* IP*FP */

    int_result  +=  mixed_prod >> 1;
    frac_result += (mixed_prod & 1) << 3;
    if (frac_result >= 0x10) { /* carry process */
        int_result  += frac_result >> 4;
        frac_result &= 0x0f;
    }
    if (frac_p) *frac_p += frac_result; /* accumulate */
    return int_result;
}

/* this calculates the squared norm scaled to one sixteenth
 * of the original coordinates (scaled by one fourth).
 * The ref_fraction pointer is a reference of a variable to
 * accumulate the fraction sixteenths of a square unit.  If
 * you are not interested in the fraction value, you can just
 * pass NULL as parameter. */
uint32_t
norm_scaled(uint32_t x, uint32_t y, int *ref_fraction)
{
    int fraction = 0;
    uint32_t result = 0;

    result += square_of_div16(x, &fraction);
    result += square_of_div16(y, &fraction);

    if (ref_fraction)
        *ref_fraction += fraction; /* the excess */

    return result;
}

/* TEST MAIN PROGRAM.  Just input pairs of coordinates in the
 * same line (separated by spaces) and calculate the squared
 * norm of the vector, scaled by 1/16 (accumulating the
 * fraction of the value in 1/16s of a square unit in the
 * location referenced.  This is done using double floating
 * point numbers and uint32_t integers. */
int main()
{
    char line[256];
    while (fgets(line, sizeof line, stdin) != NULL) {
        int x = 0, y = 0, fraction = 0;

        sscanf(line, "%u%u", &x, &y);

        uint32_t norm_16th = norm_scaled(x, y, &fraction);

        printf("Trying (%u, %u) => %u (fraction = %d/16)\n",
                x, y, norm_16th, fraction);

        double norm_sq_16th
            = (double) x/4.0 * (double)x/4.0
            + (double) y/4.0 * (double)y/4.0;

        printf("squared norm scaled: %.8f\n", norm_sq_16th);
    }
    printf("Program finished\n");
}

The function square_of_div16 calculates a scaled modulus divided by 16 of a number, so we can reuse it to calculate the squares of x and y coordinates. The function takes a pointer frac_p to an integer variable to store the fraction part (in sixteenths of a square unit)

The function norm_scaled then calculates the norm, by using the square_of_div16 function and adding both results. The fractional part is accumulated for both calls and the result is accumulated to the referred variable by pointer ref_fraction. A carry processing is done in here, to give correct results.

Finally a main() routine is in charge of querying the user to input pairs of coordinates and calculate the scaled norm of the resulting vector by calling the function and using the squares pithagorean formula applied to double values. The results should be the same in all cases.

like image 2
Luis Colorado Avatar answered Oct 23 '22 02:10

Luis Colorado