<h3>Problem</h3> Say you have <code>n</code> lists of integers, where each list only includes integers in the range from <code>1</code> to <code>n</code>. For example, for <code>n = 4</code>, we might have: <pre class="prettyprint"><code>a_1 = [1, 2] a_2 = [3] a_3 = [4, 1, 1] a_4 = [2, 3] </code></pre> Now my question is: Can I tick off all the integers between <code>1</code> and <code>n</code> in those <code>n</code> lists but with the catch that once I find one number, I can't use that list anymore to find subsequent numbers? For example, in the above example with <code>n = 4</code>, I can choose 1 from <code>a_1</code>, 2 from <code>a_4</code>, 3 from <code>a_2</code>, and 4 for <code>a_3</code>, and I have therefore filled all the numbers from 1 to 4 but only using each list once. An example where I can't find the range (and hence should return <code>False</code>) would be: <pre class="prettyprint"><code>a_1 = [1, 2] a_2 = [3, 3, 5] a_3 = [4] a_4 = [5] a_5 = [3, 4, 5] </code></pre> The reason is because if I choose 1 from a_1, I can't choose 2 anymore from any list. <h3>Approach</h3> This is my current straightforward approach. I make a cartesian product of the lists and check if there is any that, sorted, will be a range. <pre class="prettyprint"><code>import itertools def fillRange(lists): cartesian = itertools.product(*lists) return any([sorted(comb) == range(1, len(lists) + 1) for comb in cartesian]) </code></pre> <h3>Question</h3> Although my approach works, for large lists it becomes a bit inefficient. Any thoughts on how I can improve this algorithm? Thanks!

You can formulate this as a maximum flow problem in a bipartite graph where the left nodes correspond to lists, and the right nodes correspond to integers 1 to n. There is an edge in the graph iff the integer is in the corresponding list. All capacities in the graph are equal to 1. If you can find a flow of size n from the left side to the right side then the problem is soluble. Python code to do this below: <pre class="prettyprint"><code>import networkx as nx a_1 = [1, 2] a_2 = [2] a_3 = [4, 1, 1] a_4 = [2, 3] A = [a_1,a_2,a_3,a_4] n = 4 G=nx.DiGraph() for i,a in enumerate(A): for j in set(a): l = 'list'+str(i) G.add_edge(l,j,capacity=1) G.add_edge('start',l,capacity=1) for j in range(1,n+1): G.add_edge(j,'dest',capacity=1) v,flow = nx.maximum_flow(G,'start','dest') if v<n: print 'Impossible' else: for i,a in enumerate(A): for j in set(a): if flow['list'+str(i)][j]>0: print 'Use',j,'from list',a </code></pre> This prints: <pre class="prettyprint"><code>Use 1 from list [1, 2] Use 2 from list [2] Use 4 from list [4, 1, 1] Use 3 from list [2, 3] </code></pre>

Instead of testing all the combinations in order, you can speed this up a lot by testing the most-constrained lists first, and also updating the alternatives in the other lists as you add elements to your solution set. This way, you can "solve" both your examples without backtracking once. <pre class="prettyprint"><code>def search(n, lists): if n == 0: yield [] else: lists = [l for l in lists if l != []] if len(lists) >= n: least = min(lists, key=len) for val in least: new = [[x for x in lst if x != val] for lst in lists if lst is not least] for res in search(n-1, new): yield [val] + res </code></pre> Here's some debugging/tracing output for your two examples to help with the understanding. First value is <code>n</code>, then the <code>lists</code>, and finally the previously chosen <code>val</code>. <pre class="prettyprint"><code>4 [[1, 2], [3], [4, 1, 1], [2, 3]] None 3 [[1, 2], [4, 1, 1], [2]] 3 2 [[1], [4, 1, 1]] 2 1 [[4]] 1 0 [] 4 --> solution [3, 2, 1, 4] 5 [[1, 2], [3, 3, 5], [4], [5], [3, 4, 5]] None 4 [[1, 2], [3, 3, 5], [5], [3, 5]] 4 3 [[1, 2], [3, 3], [3]] 5 2 [[1, 2], []] 3 --> abort </code></pre> <hr> If you also want the indices of the lists the elements have been taken from, the code gets a little more complicated, but not much: <pre class="prettyprint"><code>def search(n, lists): if n == 0: yield [] else: if sum(1 for l in lists if l) >= n: i = min(range(len(lists)), key=lambda x: (lists[x] == [], len(lists[x]))) for val in lists[i]: new = [[x for x in lst if x != val] if lst is not lists[i] else [] for lst in lists] for res in search(n-1, new): yield [(i, val)] + res </code></pre> Result for your first example then is <code>[(1, 3), (3, 2), (0, 1), (2, 4)]</code>

Improving algorithm that uses the cartesian product

Problem

Say you have n lists of integers, where each list only includes integers in the range from 1 to n. For example, for n = 4, we might have:

a_1 = [1, 2]
a_2 = [3]
a_3 = [4, 1, 1]
a_4 = [2, 3]

Now my question is: Can I tick off all the integers between 1 and n in those n lists but with the catch that once I find one number, I can't use that list anymore to find subsequent numbers?

For example, in the above example with n = 4, I can choose 1 from a_1, 2 from a_4, 3 from a_2, and 4 for a_3, and I have therefore filled all the numbers from 1 to 4 but only using each list once.

An example where I can't find the range (and hence should return False) would be:

a_1 = [1, 2]
a_2 = [3, 3, 5]
a_3 = [4]
a_4 = [5]
a_5 = [3, 4, 5]

The reason is because if I choose 1 from a_1, I can't choose 2 anymore from any list.

Approach

This is my current straightforward approach. I make a cartesian product of the lists and check if there is any that, sorted, will be a range.

import itertools

def fillRange(lists):
  cartesian = itertools.product(*lists)
  return any([sorted(comb) == range(1, len(lists) + 1) for comb in cartesian])

Question

Although my approach works, for large lists it becomes a bit inefficient. Any thoughts on how I can improve this algorithm?

Thanks!

563

asked Aug 25 '17 13:08

McGuire

2 Answers

You can formulate this as a maximum flow problem in a bipartite graph where the left nodes correspond to lists, and the right nodes correspond to integers 1 to n.

There is an edge in the graph iff the integer is in the corresponding list.

All capacities in the graph are equal to 1.

If you can find a flow of size n from the left side to the right side then the problem is soluble.

Python code to do this below:

import networkx as nx

a_1 = [1, 2]
a_2 = [2]
a_3 = [4, 1, 1]
a_4 = [2, 3]
A = [a_1,a_2,a_3,a_4]
n = 4

G=nx.DiGraph()
for i,a in enumerate(A):
    for j in set(a):
        l = 'list'+str(i)
        G.add_edge(l,j,capacity=1)
        G.add_edge('start',l,capacity=1)
for j in range(1,n+1):
    G.add_edge(j,'dest',capacity=1)
v,flow = nx.maximum_flow(G,'start','dest')
if v<n:
    print 'Impossible'
else:
    for i,a in enumerate(A):
        for j in set(a):
            if flow['list'+str(i)][j]>0:
                print 'Use',j,'from list',a

This prints:

Use 1 from list [1, 2]
Use 2 from list [2]
Use 4 from list [4, 1, 1]
Use 3 from list [2, 3]

154

answered Oct 15 '22 10:10

Peter de Rivaz

Instead of testing all the combinations in order, you can speed this up a lot by testing the most-constrained lists first, and also updating the alternatives in the other lists as you add elements to your solution set. This way, you can "solve" both your examples without backtracking once.

def search(n, lists):
    if n == 0:
        yield []
    else:
        lists = [l for l in lists if l != []]
        if len(lists) >= n:
            least = min(lists, key=len)
            for val in least:
                new = [[x for x in lst if x != val] for lst in lists if lst is not least]
                for res in search(n-1, new):
                    yield [val] + res

Here's some debugging/tracing output for your two examples to help with the understanding. First value is n, then the lists, and finally the previously chosen val.

4 [[1, 2], [3], [4, 1, 1], [2, 3]] None
3 [[1, 2], [4, 1, 1], [2]] 3
2 [[1], [4, 1, 1]] 2
1 [[4]] 1
0 [] 4 --> solution
[3, 2, 1, 4]

5 [[1, 2], [3, 3, 5], [4], [5], [3, 4, 5]] None
4 [[1, 2], [3, 3, 5], [5], [3, 5]] 4
3 [[1, 2], [3, 3], [3]] 5
2 [[1, 2], []] 3 --> abort

If you also want the indices of the lists the elements have been taken from, the code gets a little more complicated, but not much:

def search(n, lists):
    if n == 0:
        yield []
    else:
        if sum(1 for l in lists if l) >= n:
            i = min(range(len(lists)), key=lambda x: (lists[x] == [], len(lists[x])))
            for val in lists[i]:
                new = [[x for x in lst if x != val] if lst is not lists[i] else [] for lst in lists]
                for res in search(n-1, new):
                    yield [(i, val)] + res

Result for your first example then is [(1, 3), (3, 2), (0, 1), (2, 4)]

answered Oct 15 '22 09:10

tobias_k

Related questions
                            
                                How to show the count values on the top of a bar in a countplot?
                            
                                How to use xml sax parser to read and write a large xml?
                            
                                Tweepy.cursor multiple / OR logic function for query terms
                            
                                Python, Bokeh: How to turn off auto-update of axes
                            
                                Declaring new variables inside class methods
                            
                                How to optimize a sklearn pipeline, using XGboost, for a different `eval_metric`?
                            
                                Unexpected 32-bit integer overflow in pandas/numpy int64 (python 3.6)
                            
                                import matplotlib failing on Heroku
                            
                                Save a pivottablejs figure to file
                            
                                Broadcast 1D array against 2D array for lexsort : Permutation for sorting each column independently when considering yet another vector
                            
                                how to pipe multiple sql- and py-scripts
                            
                                Adding to sqlalchemy mapping class non db attributes
                            
                                Windows 10 conda is not recognized as an internal or external command
                            
                                Passing a list as a url value to urlopen
                            
                                django.core.exceptions.ValidationError: ["'' is not a valid UUID."]
                            
                                Select and modify a slice in pandas dataframe by integer index
                            
                                Correct use of PEP 508 environment markers in setup.cfg
                            
                                Scrapy Splash Screenshots?
                            
                                Speeding up calculation of nearby groups?
                            
                                Python mix-in enumerations as dictionary key: how the type is converted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Improving algorithm that uses the cartesian product

Tags:

python

algorithm

optimization

cartesian-product