Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to find factors in a list?

What I'm looking to do:

I need to make a function that, given a list of positive integers (there can be duplicate integers), counts all triples (in the list) in which the third number is a multiple of the second and the second is a multiple of the first:

(The same number cannot be used twice in one triple, but can be used by all other triples)

For example, [3, 6, 18] is one because 18 goes evenly into 6 which goes evenly into 3.

So given [1, 2, 3, 4, 5, 6] it should find:

[1, 2, 4] [1, 2, 6] [1, 3, 6]

and return 3 (the number of triples it found)

What I've tried:

I made a couple of functions that work but are not efficient enough. Is there some math concept I don't know about that would help me find these triples faster? A module with a function that does better? I don't know what to search for...

def foo(q):
    l = sorted(q)
    ln = range(len(l))
    for x in ln:
        if len(l[x:]) > 1:
            for y in ln[x + 1:]:
                if (len(l[y:]) > 0) and (l[y] % l[x] == 0):
                    for z in ln[y + 1:]:
                        if l[z] % l[y] == 0:
                            ans += 1
    return ans

This one is a bit faster:

def bar(q):
    l = sorted(q)
    ans = 0
    for x2, x in enumerate(l):
        pool = l[x2 + 1:]
        if len(pool) > 1:
            for y2, y in enumerate(pool):
                pool2 = pool[y2 + 1:]
                if pool2 and (y % x == 0):
                    for z in pool2:
                        if z % y == 0:
                            ans += 1
    return ans

Here's what I've come up with with help from y'all but I must be doing something wrong because it get's the wrong answer (it's really fast though):

def function4(numbers):
    ans = 0
    num_dict = {}
    index = 0
    for x in numbers:
        index += 1
        num_dict[x] = [y for y in numbers[index:] if y % x == 0]

    for x in numbers:
        for y in num_dict[x]:
            for z in num_dict[y]:
                print(x, y, z)
                ans += 1

    return ans

(39889 instead of 40888) - oh, I accidentally made the index var start at 1 instead of 0. It works now.

Final Edit

I've found the best way to find the number of triples by reevaluating what I needed it to do. This method doesn't actually find the triples, it just counts them.

def foo(l):
    llen = len(l)
    total = 0
    cache = {}
    for i in range(llen):
        cache[i] = 0
    for x in range(llen):
        for y in range(x + 1, llen):
            if l[y] % l[x] == 0:
                cache[y] += 1
                total += cache[x]
    return total

And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):

def bar(l):
    list_length = len(l)
    total_triples = 0
    cache = {}
    for i in range(list_length):
        cache[i] = 0
    for x in range(list_length):
        print("\n\nfor index[{}]: {}".format(x, l[x]))
        for y in range(x + 1, list_length):
            print("\n\ttry index[{}]: {}".format(y, l[y]))
            if l[y] % l[x] == 0:
                print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
                cache[y] += 1
                total_triples += cache[x]
                print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
                print("\t\tcount is now {}".format(total_triples))
                print("\t\t(+{} from cache[{}])".format(cache[x], x))
            else:
                print("\n\t\tfalse")
    print("\ntotal number of triples:", total_triples)
like image 831
Elliot Roberts Avatar asked Oct 08 '16 20:10

Elliot Roberts


2 Answers

Right now your algorithm has O(N^3) running time, meaning that every time you double the length of the initial list the running time goes up by 8 times.

In the worst case, you cannot improve this. For example, if your numbers are all successive powers of 2, meaning that every number divides every number grater than it, then every triple of numbers is a valid solution so just to print out all the solutions is going to be just as slow as what you are doing now.

If you have a lower "density" of numbers that divide other numbers, one thing you can do to speed things up is to search for pairs of numbers instead of triples. This will take time that is only O(N^2), meaning the running time goes up by 4 times when you double the length of the input list. Once you have a list of pairs of numbers you can use it to build a list of triples.

# For simplicity, I assume that a number can't occur more than once in the list.
# You will need to tweak this algorithm to be able to deal with duplicates.

# this dictionary will map each number `n` to the list of other numbers
# that appear on the list that are multiples of `n`.
multiples = {}
for n in numbers:
   multiples[n] = []

# Going through each combination takes time O(N^2)
for x in numbers:
   for y in numbers:
     if x != y and y % x == 0:
         multiples[x].append(y)

# The speed on this last step will depend on how many numbers
# are multiples of other numbers. In the worst case this will
# be just as slow as your current algoritm. In the fastest case
# (when no numbers divide other numbers) then it will be just a
# O(N) scan for the outermost loop.
for x in numbers:
    for y in multiples[x]:
        for z in multiples[y]:
            print(x,y,z)

There might be even faster algorithms, that also take advantage of algebraic properties of division but in your case I think a O(N^2) is probably going to be fast enough.

like image 142
hugomg Avatar answered Sep 18 '22 02:09

hugomg


the key insight is:

if a divides b, it means a "fits into b". if a doesn't divide c, then it means "a doesn't fit into c". And if a can't fit into c, then b cannot fit into c (imagine if b fitted into c, since a fits into b, then a would fit into all the b's that fit into c and so a would have to fit into c too.. (think of prime factorisation etc))

this means that we can optimise. If we sort the numbers smallest to largest and start with the smaller numbers first. First iteration, start with the smallest number as a If we partition the numbers into two groups, group 1, the numbers which a divides, and group 2 the group which a doesn't divide, then we know that no numbers in group 1 can divide numbers in group 2 because no numbers in group 2 have a as a factor.

so if we had [2,3,4,5,6,7], we would start with 2 and get: [2,4,6] and [3,5,7] we can repeat the process on each group, splitting into smaller groups. This suggests an algorithm that could count the triples more efficiently. The groups will get really small really quickly, which means its efficiency should be fairly close to the size of the output.

like image 20
robert king Avatar answered Sep 20 '22 02:09

robert king