Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster Python technique to count triples from a list of numbers that are multiples of each other

Suppose we have a list of numbers, l. I need to COUNT all tuples of length 3 from l, (l_i,l_j,l_k) such that l_i evenly divides l_j, and l_j evenly divides l_k. With the stipulation that the indices i,j,k have the relationship i<j<k

I.e.;

If l=[1,2,3,4,5,6], then the tuples would be [1,2,6], [1,3,6],[1,2,4], so the COUNT would be 3.

If l=[1,1,1], then the only tuple would be [1,1,1], so the COUNT would be 1.

Here's what I've done so far, using list comprehensions:

def myCOUNT(l):
    newlist=[[x,y,z] for x in l for y in l for z in l if (z%y==0 and y%x==0 and l.index(x)<l.index(y) and l.index(y)<l.index(z))]
    return len(newlist)

>>>l=[1,2,3,4,5,6]
>>>myCOUNT(l)
3

This works, but as l gets longer (and it can be as large as 2000 elements long), the time it takes increases too much. Is there a faster/better way to do this?

like image 304
Emm Gee Avatar asked Aug 11 '18 23:08

Emm Gee


People also ask

How many combinations of triplets are possible?

So, the correct answer is '64'.

What is lucky triples?

"Lucky triple" is defined as "In a list lst , for any combination of triple like (lst[i], lst[j], lst[k]) where i < j < k , where lst[i] divides lst[j] and lst[j] divides lst[k] .


4 Answers

We can count the number of triples with a given number in the middle by counting how many factors of that number are to its left, counting how many multiples of that number are to its right, and multiplying. Doing this for any given middle element is O(n) for a length-n list, and doing it for all n possible middle elements is O(n^2).

def num_triples(l):
    total = 0
    for mid_i, mid in enumerate(l):
        num_left = sum(1 for x in l[:mid_i] if mid % x == 0)
        num_right = sum(1 for x in l[mid_i+1:] if x % mid == 0)
        total += num_left * num_right
    return total

Incidentally, the code in your question doesn't actually work. It's fallen into the common newbie trap of calling index instead of using enumerate to get iteration indices. More than just being inefficient, this is actually wrong when the input has duplicate elements, causing your myCOUNT to return 0 instead of 1 on the [1, 1, 1] example input.

like image 59
user2357112 supports Monica Avatar answered Oct 18 '22 20:10

user2357112 supports Monica


Finding all tuples in O(n2)

You algorithm iterates over all possible combinations, which makes it O(n3).

Instead, you should precompute the division-tree of your list of numbers and recover triples from the paths down the tree.

Division tree

A division tree is a graph which nodes are numbers and children are the multiples of each number.

By example, given the list [1, 2, 3, 4], the division tree looks like this.

   1
  /|\
 2 | 3
  \|
   4

Computing the division tree requires to compare each number against all others, making its creation O(n2).

Here is a basic implementation of a division-tree that can be used for your problem.

class DivisionTree:
    def __init__(self, values):
        values = sorted(values)

        # For a division-tree to be connected, we need 1 to be its head
        # Thus we artificially add it and note whether it was actually in our numbers
        if 1 in values:
            self._has_one = True
            values = values[1:]
        else:
            self._has_one = False

        self._graph = {1: []}

        for v in values:
            self.insert(v)

    def __iter__(self):
        """Iterate over all values of the division-tree"""
        yield from self._graph

    def insert(self, n):
        """Insert value in division tree, adding it as child of each divisor"""
        self._graph[n] = []

        for x in self:
            if n != x and n % x == 0:
                self._graph[x].append(n)

    def paths(self, depth, _from=1):
        """Return a generator of all paths of *depth* down the division-tree"""
        if _from == 1:
            for x in self._graph[_from]:
                yield from self.paths(depth , _from=x)

        if depth == 1:
            yield [_from]
            return

        if _from != 1 or self._has_one:
            for x in self._graph[_from]:
                for p in self.paths(depth - 1, _from=x):
                    yield [_from, *p]

Usage

Once we built a DivisionTree, it suffices to iterate over all paths down the graph and select only those which have length 3.

Example

l = [1,2,3,4,5,6]

dt = DivisionTree(l)

for p in dt.paths(3):
    print(p)

Output

[1, 2, 4]
[1, 2, 6]
[1, 3, 6]

This solution assumes that the list of number is initially sorted, as in your example. Although, the output could be filtered with regard to the condition on indices i < j < k to provide a more general solution.

Time complexity

Generating the division-tree is O(n2).

In turn, there can be up to n! different paths, although stopping the iteration whenever we go deeper than 3 prevents traversing them all. This makes us iterate over the following paths:

  • the paths corresponding to three tuples, say there are m of them;

  • the paths corresponding to two tuples, there are O(n2) of them;

  • the paths corresponding to one tuples, there are O(n) of them.

Thus this overall yields an algorithm O(n2 + m).

like image 29
Olivier Melançon Avatar answered Oct 18 '22 19:10

Olivier Melançon


I suppose this solution without list comprehension will be faster (you can see analogue with list comprehension further):

a = [1, 2, 3, 4, 5, 6]

def count(a):   
    result = 0
    length = len(a)

    for i in range(length):
        for j in range(i + 1, length):
            for k in range(j + 1, length):
                if a[j] % a[i] == 0 and a[k] % a[j] == 0:
                    result += 1

    return result

print(count(a))

Output:

3

In your solution index method is too expensive (requires O(n) operations). Also you don't need to iterate over full list for each x, y and z (x = a[i], y = a[j], z = a[k]). Notice how I use indexes in my loops for y and z because I know that a.index(x) < a.index(y) < a.index(z) is always satisfied.


You can write it as one liner too:

def count(a):   
    length = len(a)

    return sum(1 for i in range(length)
               for j in range(i + 1, length)
               for k in range(j + 1, length)
               if a[j] % a[i] == 0 and a[k] % a[j] == 0)

P.S. Please, don't use l letter for variables names because it's very similar to 1:)

like image 3
Lev Zakharov Avatar answered Oct 18 '22 20:10

Lev Zakharov


There is a way to do this with itertools combinations:

from itertools import combinations 
l=[1,2,3,4,5,6] 

>>> [(x,y,z) for x,y,z in combinations(l,3) if z%y==0 and y%x==0]
[(1, 2, 4), (1, 2, 6), (1, 3, 6)]

Since combinations generates the tuples in list order, you do not then need to check the index of z.

Then your myCOUNT function becomes:

def cnt(li):
    return sum(1 for x,y,z in combinations(li,3) if z%y==0 and y%x==0)

>>> cnt([1,1,1])
1
>>> cnt([1,2,3,4,5,6])
3

This is a known problem.

Here are some timing for the solutions here:

from itertools import combinations 

class DivisionTree:
    def __init__(self, values):    
        # For a division-tree to be connected, we need 1 to be its head
        # Thus we artificially add it and note whether it was actually in our numbers
        if 1 in values:
            self._has_one = True
            values = values[1:]
        else:
            self._has_one = False

        self._graph = {1: []}

        for v in values:
            self.insert(v)

    def __iter__(self):
        """Iterate over all values of the division-tree"""
        yield from self._graph

    def insert(self, n):
        """Insert value in division tree, adding it as child of each divisor"""
        self._graph[n] = []

        for x in self:
            if n != x and n % x == 0:
                self._graph[x].append(n)

    def paths(self, depth, _from=1):
        """Return a generator of all paths of *depth* down the division-tree"""
        if _from == 1:
            for x in self._graph[_from]:
                yield from self.paths(depth , _from=x)

        if depth == 1:
            yield [_from]
            return

        if _from != 1 or self._has_one:
            for x in self._graph[_from]:
                for p in self.paths(depth - 1, _from=x):
                    yield [_from, *p]


def f1(li):
    return sum(1 for x,y,z in combinations(li,3) if z%y==0 and y%x==0)

def f2(l):
    newlist=[[x,y,z] for x in l for y in l for z in l if (z%y==0 and y%x==0 and l.index(x)<l.index(y) and l.index(y)<l.index(z))]
    return len(newlist)

def f3(a):   
    result = 0
    length = len(a)

    for i in range(length):
        for j in range(i + 1, length):
            for k in range(j + 1, length):
                if a[j] % a[i] == 0 and a[k] % a[j] == 0:
                    result += 1

    return result

def f4(l):
    dt = DivisionTree(l)
    return sum(1 for _ in dt.paths(3))  

def f5(l):
    total = 0
    for mid_i, mid in enumerate(l):
        num_left = sum(1 for x in l[:mid_i] if mid % x == 0)
        num_right = sum(1 for x in l[mid_i+1:] if x % mid == 0)
        total += num_left * num_right
    return total    


if __name__=='__main__':
    import timeit
    tl=list(range(3,155))
    funcs=(f1,f2,f3,f4,f5)
    td={f.__name__:f(tl) for f in funcs}
    print(td)
    for case, x in (('small',50),('medium',500),('large',5000)):
        li=list(range(2,x))
        print('{}: {} elements'.format(case,x))
        for f in funcs:
            print("   {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("f(li)", setup="from __main__ import f, li", number=1))) 

And the results:

{'f1': 463, 'f2': 463, 'f3': 463, 'f4': 463, 'f5': 463}
small: 50 elements
       f1    0.0010 secs
       f2    0.0056 secs
       f3    0.0018 secs
       f4    0.0003 secs
       f5    0.0002 secs
medium: 500 elements
       f1    1.1702 secs
       f2    5.3396 secs
       f3    1.8519 secs
       f4    0.0156 secs
       f5    0.0110 secs
large: 5000 elements
       f1    1527.4956 secs
       f2    6245.9930 secs
       f3    2074.2257 secs
       f4    1.3492 secs
       f5    1.2993 secs

You can see that f1,f2,f3 are clearly O(n^3) or worse and f4,f5 are O(n^2). f2 took more than 90 minutes for what f4 and f5 did in 1.3 seconds.

like image 1
dawg Avatar answered Oct 18 '22 21:10

dawg