Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to find divisibility

Professor says this isn't a efficient algorithm to check whether the number is divisible by a number from 100,000-150,000. I'm having trouble finding a better way. Any help would be appreciated.

unsigned short divisibility_check(unsigned long n) {
    unsigned long i;
    for (i = 100000; i <= 150000; i++) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}
like image 691
phirom peterschmidt Avatar asked Feb 04 '23 21:02

phirom peterschmidt


1 Answers

Let's say you need to find whether a positive integer K is divisible by a number between 100,000 and 150,000, and it is such a rare operation, that doing precalculations is just not worth the processor time or memory used.

If K < 100,000, it cannot be divisible by a number between 100,000 and 150,000.

If 100,000 ≤ K ≤ 150,000, it is divisible by itself. It is up to you to decide whether this counts or not.

For a K > 150,000 to be divisible by M, with 100,000 ≤ M ≤ 150,000, K must also be divisible by L = K / M. This is because K = L × M, and all three are positive integers. So, you only need to test the divisibility of K by a set of L, where ⌊ K / 150,000 ⌋ ≤ L ≤ ⌊ K / 100,000 ⌋.

However, that set of Ls becomes larger than the set of possible Ms when K > = 15,000,000,000. Then it is again less work to just test K for divisibility against each M, much like OP's code is now.


When implementing this as a program, the most important thing in practice is, surprisingly, the comments you add. Do not write comments that describe what the code does; write comments that explain the model or algorithm you are trying to implement (say, at the function level), and your intent of what each small block of code should accomplish.

In this particular case, you should probably add a comment to each if clause, explaining your reasoning, much like I did above.

Beginner programmers often omit comments completely. It is unfortunate, because writing good comments is a hard habit to pick up afterwards. It is definitely a good idea to learn to comment your code (as I described above -- the comments that describe what the code does are less than useful; more noise than help), and keep honing your skill on that.

A programmer whose code is maintainable, is worth ten geniuses who produce write-only code. This is because all code has bugs, because humans make errors. To be an efficient developer, your code must be maintainable. Otherwise you're forced to rewrite each buggy part from scratch, wasting a lot of time. And, as you can see above, "optimization" at the algorithmic level, i.e. thinking about how to avoid having to do work, yields much better results than trying to optimize your loops or something like that. (You'll find in real life that surprisingly often, optimizing a loop in the proper way, removes the loop completely.)

Even in exercises, proper comments may be the difference between "no points, this doesn't work" and "okay, I'll give you partial credit for this one, because you had a typo/off-by-one bug/thinko on line N, but otherwise your solution would have worked".


As bolov did not understand how the above leads to a "naive_with_checks" function, I'll show it implemented here.

For ease of testing, I'll show a complete test program. Supply the range of integers to test, and the range of divisors accepted, as parameters to the program (i.e. thisprogram 1 500000 100000 150000 to duplicate bolov's tests).

#include <stdlib.h>
#include <inttypes.h>
#include <limits.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <errno.h>

int is_divisible(const uint64_t  number,
                 const uint64_t  minimum_divisor,
                 const uint64_t  maximum_divisor)
{
    uint64_t divisor, minimum_result, maximum_result, result;

    if (number < minimum_divisor) {
        return 0;
    }

    if (number <= maximum_divisor) {
        /* Number itself is a valid divisor. */
        return 1;
    }

    minimum_result = number / maximum_divisor;
    if (minimum_result < 2) {
        minimum_result = 2;
    }

    maximum_result = number / minimum_divisor;
    if (maximum_result < minimum_result) {
        maximum_result = minimum_result;
    }

    if (maximum_result - minimum_result > maximum_divisor - minimum_divisor) {
        /* The number is so large that it is the least amount of work
           to check each possible divisor. */
        for (divisor = minimum_divisor; divisor <= maximum_divisor; divisor++) {
            if (number % divisor == 0) {
                return 1;
            }
        }

        return 0;

    } else {
        /* There are fewer possible results than divisors,
           so we check the results instead. */

        for (result = minimum_result; result <= maximum_result; result++) {
            if (number % result == 0) {
                divisor = number / result;
                if (divisor >= minimum_divisor && divisor <= maximum_divisor) {
                    return 1;
                }
            }
        }

        return 0;
    }
}

int parse_u64(const char *s, uint64_t *to)
{
    unsigned long long  value;
    const char         *end;

    /* Empty strings are not valid. */
    if (s == NULL || *s == '\0')
        return -1;

    /* Parse as unsigned long long. */
    end = s;
    errno = 0;
    value = strtoull(s, (char **)(&end), 0);
    if (errno == ERANGE)
        return -1;
    if (end == s)
        return -1;

    /* Overflow? */
    if (value > UINT64_MAX)
        return -1;

    /* Skip trailing whitespace. */
    while (isspace((unsigned char)(*end)))
        end++;

    /* If the string does not end here, it has garbage in it. */
    if (*end != '\0')
        return -1;

    if (to)
        *to = (uint64_t)value;

    return 0;
}

int main(int argc, char *argv[])
{
    uint64_t kmin, kmax, dmin, dmax, k, count;

    if (argc != 5) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help | help ]\n", argv[0]);
        fprintf(stderr, "       %s MIN MAX MIN_DIVISOR MAX_DIVISOR\n", argv[0]);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program counts which positive integers between MIN and MAX,\n");
        fprintf(stderr, "inclusive, are divisible by MIN_DIVISOR to MAX_DIVISOR, inclusive.\n");
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

    /* Use current locale. This may change which codes isspace() considers whitespace. */
    if (setlocale(LC_ALL, "") == NULL)
        fprintf(stderr, "Warning: Your C library does not support your current locale.\n");

    if (parse_u64(argv[1], &kmin) || kmin < 1) {
        fprintf(stderr, "%s: Invalid minimum positive integer to test.\n", argv[1]);
        return EXIT_FAILURE;
    }
    if (parse_u64(argv[2], &kmax) || kmax < kmin || kmax >= UINT64_MAX) {
        fprintf(stderr, "%s: Invalid maximum positive integer to test.\n", argv[2]);
        return EXIT_FAILURE;
    }
    if (parse_u64(argv[3], &dmin) || dmin < 2) {
        fprintf(stderr, "%s: Invalid minimum divisor to test for.\n", argv[3]);
        return EXIT_FAILURE;
    }
    if (parse_u64(argv[4], &dmax) || dmax < dmin) {
        fprintf(stderr, "%s: Invalid maximum divisor to test for.\n", argv[4]);
        return EXIT_FAILURE;
    }

    count = 0;
    for (k = kmin; k <= kmax; k++)
        count += is_divisible(k, dmin, dmax);

    printf("%" PRIu64 "\n", count);
    return EXIT_SUCCESS;
}

It is useful to note that the above, running bolov's test, i.e. thisprogram 1 500000 100000 150000 only takes about 15 ms of wall clock time (13 ms CPU time), median, on a much slower Core i5-7200U processor. For really large numbers, like 280,000,000,000 to 280,000,010,000, the test does the maximum amount of work, and takes about 3.5 seconds per 10,000 numbers on this machine.

In other words, I wouldn't trust bolov's numbers to have any relation to timings for properly written test cases.

It is important to note that for any K between 1 and 500,000, the same test that bolov says their code measures, the above code does at most two divisibility tests to find if K is divisible by an integer between 100,000 and 150,000.

This solution is therefore quite efficient. It is definitely acceptable and near-optimal, when the tested K are relatively small (say, 32 bit unsigned integers or smaller), or when precomputed tables cannot be used.

Even when precomputed tables can be used, it is unclear if/when prime factorization becomes faster than the direct checks. There is certainly a tradeoff in the size and content of the precomputed tables. bolov claims that it is clearly superior to other methods, but hasn't implemented a proper "naive" divisibility test as shown above, and bases their opinion on experiments on quite small integers (1 to 500,000) that have simple prime decompositions.

As an example, a table of integers 1 to 500,000 pre-checked for divisibility takes only 62500 bytes (43750 bytes for 150,000 to 500,000). With that table, each test takes a small near-constant time (that only depends on memory and cache effects). Extending it to all 32-bit unsigned integers would require 512 GiB (536,870,912 bytes); the table can be stored in a memory-mapped read-only file, to let the OS kernel manage how much of it is mapped to RAM at any time.

Prime decomposition itself, especially using trial division, becomes more expensive than the naive approach when the number of trial divisions exceeds the range of possible divisors (50,000 divisors in this particular case). As there are 13848 primes (if one counts 1 and 2 as primes) between 1 and 150,000, the number of trial divisions can easily approach the number of divisors for sufficiently large input values.

For numbers with many prime factors, the combinatoric phase, finding if any subset of the prime factors multiply to a number between 100,000 and 150,000 is even more problematic. The number of possible combinations grows faster than exponentially. Without careful checks, this phase alone can do way more work per large input number than just trial division with each possible divisor would be.

(As an example, if you have 16 different prime factors, you already have 65,535 different combinations; more than the number of direct trial divisions. However, all such numbers are larger than 64-bit; the smallest being 2·3·5·7·11·13·17·19·23·29·31·37·41·43·47·53 = 32,589,158,477,190,044,730 which is a 65-bit number.)

There is also the problem of code complexity. The more complex the code, the harder it is to debug and maintain.

like image 157
Nominal Animal Avatar answered Feb 06 '23 10:02

Nominal Animal