Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating tuples of all possible combinations of items from two lists, without duplicating items within tuples

I would like to be able to take a range of numbers and return a list containing triples without duplicates. Each element of x should appear once in each position of the triples. The goal is to get something like the following:

get_combinations_without_duplicates(3) = [(0, 1, 2), (1, 2, 0), (2, 0, 1)]

For range(3) this is just a list rotation, but for higher ranges, there are more possible combinations. I would like to be able to randomly generate a list of triples that satisfies these constraints.

Suppose we start by specifying the first element of each triple for the case where n=4:

[(0,), (1,), (2,), (3,)]

The second element of the first triple can be anything other than 0. Once one of these is chosen, then this limits the options for the next triple, and so on. The goal is to have a function that takes a number and creates the triples in this manner, but doesn't always create the same set of triples. That is, the end result could be a rotation:

[(0, 1, 2), (1, 2, 3), (2, 3, 0), (3, 0, 1),]

or

[(0, 2, 3), (1, 3, 0), (2, 0, 1), (3, 1, 2)]

Here is an implementation of this function:

def get_combinations_without_duplicates(n):
    output = []
    second = range(n)
     third = range(n)
for i in range(n):
    triple = [i]
    #Get the second value of the triple, but make sure that it isn't a 
    #duplicate of the first value
    #in the triple or any value that has appeared in the second position of any triple
    choices_for_second = [number for number in second if number not in triple]
    #Randomly select a number from the allowed possibilities
    n_second = random.choice(choices_for_second) 
    #Append it to the triple
    triple.append(n_second)
    #Remove that value from second so that it won't be chosen for other triples
    second = [number for number in second if number != n_second]
    #Do the same for the third value
    choices_for_third = [number for number in third if number not in triple]
    n_third = random.choice(choices_for_third)
    triple.append(n_third)
    third = [number for number in third if number != n_third]
    output.append(tuple(triple))
return output

As pointed out below, this process will sometimes randomly select combinations that don't work. That can be handled if you do something like:

def keep_trying(n):
    try:
        return get_combinations_without_duplicates(n)
    except IndexError:
        return keep_trying(n)

However, I'm wondering if there is a better way to do this in general.

like image 525
wdnsd Avatar asked Oct 23 '12 17:10

wdnsd


People also ask

How do you get all the combinations between two lists in Python?

The unique combination of two lists in Python can be formed by pairing each element of the first list with the elements of the second list. Method 1 : Using permutation() of itertools package and zip() function. Approach : Import itertools package and initialize list_1 and list_2.

Is tuple allow duplicates?

Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

Can a tuple have more than 2 elements?

Suppose we wish to have a function return multiple values. We can do that by using a tuple containing two or more elements. A tuple is an ordered set of one or more elements, which can have different types.


2 Answers

Let's try this again.

A few observations.

  1. The first value will always be zero in a sorted array of your tuples.
  2. The length of the array will always be as long as the number of tuples that exist in your array.
  3. You want these to be randomly generated.
  4. The tuples are produced in 'sorted' order.

Based on these specifications, we can come up with a procedural method;

  1. Generate 2 lists of serial integers, one to pick from, the other to seed from.
  2. For each number in the seed list, [0, 1, 2, 3], randomly append and remove a number that's not already in the element. [01, 13, 20, 32]
  3. Generate another list of serial integers, and repeat. [012, 130, 203, 321]

But, this doesn't work. For some iterations, it will back itself into a corner and not be able to generate a number. For instance, [01, 13, 20, 32].. appending [3, 0, 1... crap, I'm stuck.

The only way to fix this, is to do a true shuffling over the entire row, and reshuffle until one fits. This may take quite a bit of time, and will only get more painful as the sets get longer.

So, procedurally speaking:

Solution 1: Random generation

  1. Populate a list with your range. [0, 1, 2, 3]
  2. Create another list. [0, 1, 2, 3]
  3. Shuffle the list. [1, 0, 2, 3]
  4. Shuffle until you find one that fits... [1, 2, 3, 0]
  5. Repeat with the third element.

With this procedure, while a computer can verify solutions very quickly, it cannot generate solutions very quickly. However, it is merely one of two ways to generate a truly random answer.

Therefore, the fastest guaranteed method would use make use of a verification procedure, rather than a generating procedure. First things first, generate all the possible permutations.

from itertools import permutations

n = 4
candidates = [i for i in permutations(xrange(n),3)]

Then.

Solution 2: Random verification

  1. Pick a triplet that starts with 0.
  2. Pop, at random, a triplet that does not start with 0.
  3. Verify if the randomly picked triplet is an intermediate solution.
  4. If not, pop another triplet.
  5. If yes, append the triplet, and REPOPULATE THE TRIPLET QUEUE.
  6. Repeat n times. # or until you exhaust the queue, at which point repeat n times naturally becomes TRUE

A solution for the next triplet is mathematically guaranteed to be in the solution set, so if you just let it exhaust itself, a randomized solution should appear. The problem with this approach is that there's no guarantee that every possible outcome has an equal probability.

Solution 3: Iterative verification

For equal probability results, get rid of the randomization, and generate every possible 3-tuple combination, n-lists long-- and verify each of those solution candidates.

Write a function to verify over the list of candidate solutions to produce every solution, and then randomly pop a solution from that list.

from itertools import combinations

results = [verify(i) for i in combinations(candidates, n)]
# this is 10626 calls to verify for n=4, 5 million for n=5 
# this is not an acceptable solution.  

Neither Solution 1 or 3 is very fast, O(n**2), but given your criteria, it's possible this is as fast as it'll get if you want a truly random solution. Solution 2 will guaranteed be the fastest of these three, often times significantly beating 1 or 3, Solution 3 has the most stable results. Which of these approaches you choose will depend on what you want to do with the output.

Afterward:

Ultimately, the speed of the code will be contingent on exactly how random you want your code to be. An algorithm to spit out the VERY first (and only the very first) instance of a tuple series that satisfies your requirement can run supremely quickly, as it just attacks the permutations in order, once, and it will run in O(n) time. However, it will not do anything randomly...

Also, here's some quick code for verify(i). It's based on the observation that two tuples may not have the same number in the same index.

def verify(t):
    """ Verifies that a set of tuples satisfies the combinations without duplicates condition. """
    zipt = zip(*t)
    return all([len(i) == len(set(i)) for i in zipt])

n = 4 Full Solution Set

((0, 1, 2), (1, 0, 3), (2, 3, 0), (3, 2, 1))
((0, 1, 2), (1, 0, 3), (2, 3, 1), (3, 2, 0))
((0, 1, 2), (1, 2, 3), (2, 3, 0), (3, 0, 1))
((0, 1, 2), (1, 3, 0), (2, 0, 3), (3, 2, 1))
((0, 1, 3), (1, 0, 2), (2, 3, 0), (3, 2, 1))
((0, 1, 3), (1, 0, 2), (2, 3, 1), (3, 2, 0))
((0, 1, 3), (1, 2, 0), (2, 3, 1), (3, 0, 2))
((0, 1, 3), (1, 3, 2), (2, 0, 1), (3, 2, 0))
((0, 2, 1), (1, 0, 3), (2, 3, 0), (3, 1, 2))
((0, 2, 1), (1, 3, 0), (2, 0, 3), (3, 1, 2))
((0, 2, 1), (1, 3, 0), (2, 1, 3), (3, 0, 2))
((0, 2, 1), (1, 3, 2), (2, 0, 3), (3, 1, 0))
((0, 2, 3), (1, 0, 2), (2, 3, 1), (3, 1, 0))
((0, 2, 3), (1, 3, 0), (2, 0, 1), (3, 1, 2))
((0, 2, 3), (1, 3, 2), (2, 0, 1), (3, 1, 0))
((0, 2, 3), (1, 3, 2), (2, 1, 0), (3, 0, 1))
((0, 3, 1), (1, 0, 2), (2, 1, 3), (3, 2, 0))
((0, 3, 1), (1, 2, 0), (2, 0, 3), (3, 1, 2))
((0, 3, 1), (1, 2, 0), (2, 1, 3), (3, 0, 2))
((0, 3, 1), (1, 2, 3), (2, 1, 0), (3, 0, 2))
((0, 3, 2), (1, 0, 3), (2, 1, 0), (3, 2, 1))
((0, 3, 2), (1, 2, 0), (2, 1, 3), (3, 0, 1))
((0, 3, 2), (1, 2, 3), (2, 0, 1), (3, 1, 0))
((0, 3, 2), (1, 2, 3), (2, 1, 0), (3, 0, 1))

n = 5 has 552 unique solutions. Here are the first 20.

((0, 1, 2), (1, 0, 3), (2, 3, 4), (3, 4, 0), (4, 2, 1))
((0, 1, 2), (1, 0, 3), (2, 3, 4), (3, 4, 1), (4, 2, 0))
((0, 1, 2), (1, 0, 3), (2, 4, 0), (3, 2, 4), (4, 3, 1))
((0, 1, 2), (1, 0, 3), (2, 4, 1), (3, 2, 4), (4, 3, 0))
((0, 1, 2), (1, 0, 4), (2, 3, 0), (3, 4, 1), (4, 2, 3))
((0, 1, 2), (1, 0, 4), (2, 3, 1), (3, 4, 0), (4, 2, 3))
((0, 1, 2), (1, 0, 4), (2, 4, 3), (3, 2, 0), (4, 3, 1))
((0, 1, 2), (1, 0, 4), (2, 4, 3), (3, 2, 1), (4, 3, 0))
((0, 1, 2), (1, 2, 0), (2, 3, 4), (3, 4, 1), (4, 0, 3))
((0, 1, 2), (1, 2, 0), (2, 4, 3), (3, 0, 4), (4, 3, 1))
((0, 1, 2), (1, 2, 3), (2, 0, 4), (3, 4, 0), (4, 3, 1))
((0, 1, 2), (1, 2, 3), (2, 0, 4), (3, 4, 1), (4, 3, 0))
((0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 0), (4, 0, 1))
((0, 1, 2), (1, 2, 3), (2, 4, 0), (3, 0, 4), (4, 3, 1))
((0, 1, 2), (1, 2, 3), (2, 4, 1), (3, 0, 4), (4, 3, 0))
((0, 1, 2), (1, 2, 4), (2, 0, 3), (3, 4, 0), (4, 3, 1))
((0, 1, 2), (1, 2, 4), (2, 0, 3), (3, 4, 1), (4, 3, 0))
((0, 1, 2), (1, 2, 4), (2, 3, 0), (3, 4, 1), (4, 0, 3))
((0, 1, 2), (1, 2, 4), (2, 3, 1), (3, 4, 0), (4, 0, 3))
((0, 1, 2), (1, 2, 4), (2, 4, 3), (3, 0, 1), (4, 3, 0))

So, you can generate solutions like this, but it takes time. If you were going to utilize this, I would cache the solutions generated as is, and then randomly pull from them when you need them for whatever number n. Incidentally, n=5 took a little under a minute to complete, brute force. Since the solution is O(n**2), I expect n=6 to take over an hour, n=7 over a day. The only way you can get return a true randomized solution is by doing it this way.

Edited: Random Solution without equal distribution:

The following is code I wrote in attempting to solve this problem, an implementation of Solution 2. I figured I would post it, since it is a partial, non-equal distribution solution, and generates every possible solution, guaranteed, given enough time.

def seeder(n):
    """ Randomly generates the first element in a solution. """
    seed = [0]
    a = range(1, n)
    for i in range(1, 3):
        seed.append(a.pop(random.randint(0,len(a)-1)))
    return [seed]

def extend_seed(seed, n):
    """ Semi-Randomly generates the next element in a solution. """
    next_seed = [seed[-1][0] + 1]
    candidates = range(0, n)
    for i in range(1, 3):
        c = candidates[:]
        for s in next_seed:
            c.remove(s)
        for s in seed:
            try:
                c.remove(s[i])
            except ValueError:
                pass
        next_seed.append(c.pop(random.randint(0,len(c)-1)))
    seed.append(next_seed)
    return seed

def combo(n):
    """ Extends seed until exhausted. 
    Some random generations return results shorter than n. """
    seed = seeder(n)
    while True:
        try:
            extend_seed(seed, n)
        except (ValueError, IndexError):
            return seed

def combos(n):
    """ Ensures that the final return is of length n. """
    while True:
        result = combo(n)
        if len(result) == n:
            return result
like image 105
17 revs Avatar answered Sep 23 '22 11:09

17 revs


You essentially want a Latin square, a n x n grid of numbers where each row and each column contains each number exactly once, except you only care about the first three numbers in each row (a Latin rectangle).

Update:

I've erased my ineffective code sample, because generating random Latin squares with equal probability is non-trivial, as discussed in a question on math.stackexchange.com.

The SAGE project implements the algorithm mentioned in that question, so you may look at the code for inspiration.

Alternatively, if you really want to get into the details, check out this paper for the specific case of generating random Latin rectangles.

like image 40
Mzzzzzz Avatar answered Sep 22 '22 11:09

Mzzzzzz