Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to combine 2 lists uniquely

Tags:

python

list

I'm working with extremely long lists and am trying to come up with an iterative solution to combining the 2 lists in a unique way.

For example, I have lists

a = [TF1,Tar1]
b = [Tar1, TF1]

I want the following iterator (if possible) containing the tuples:

(TF1,Tar1)    
(TF1,TF1)  
(Tar1,Tar1)  

This excludes (Tar1,TF1) because the opposite ordering has already been added.

My current approach is loop through each list and use a dictionary to keep track of what's been added. This is taking up a huge amount of RAM because list a is 12,000 long and list b is 15000 long. Making the resulting dictionary contain about a*b/2 entries which in this case is 90M entries.

Any suggestions are appreciated. Thanks

like image 766
user3417525 Avatar asked Oct 25 '14 00:10

user3417525


People also ask

How do I join two lists without duplicates?

Use set() and list() to combine two lists while removing duplicates in the new list and keeping duplicates in original list. Call set(list_1) and set(list_2) to generate sets of the elements in list_1 and list_2 respectively which contain no duplicates.

How do I consolidate two lists?

We can use + operator to merge two lists i.e. It returned a new concatenated lists, which contains the contents of both list_1 and list_2. Whereas, list_1 and list_2 remained same as original.

How do you combine two lists in Excel?

On the Data tab, under Tools, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data, and then click Add. The file path is entered in All references.


1 Answers

Basically, the problem arises with common elements between two lists. If you can segregate the cases of combining common and unique elements, you would solve your problem

i.e. you need to create the following Cartesian products

a_unique X b_unique
a_unique X b_common
a_common X b_unique
a_common X b_common 

Of the four cases, the last one would pose a problem as it would create non-unique pairs. On a second thought, the last Cartesian with unique pairs is a simple selection of 2 elements from a_common.

Finally, segregating the elements can be done by creating a set and of both the lists and then iterating while comparing

>>> #Sample Lists
>>> a = ['C0','C1','C2','A0','A1','A2']
>>> b = ['C0','C1','C2','B0','B1','B2']
>>> from itertools import product, combinations, chain
>>> # Create sets for O(1) lookup
>>> a_key = set(a)
>>> b_key = set(b)
>>> # Segerate elements to unique and common for both lists
>>> a = {'common':a_key & b_key,
         'unique':a_key - common}
>>> b = {'common':a_key & b_key,
         'unique':b_key - common}
>>> # Create cartesian products forall the cases
>>> list(chain.from_iterable([product(a['unique'], b['unique']),
                      product(a['unique'], b['common']),
                      product(a['common'], b['unique']),
                      combinations(a['common'], 2)]))
[('A0', 'B0'), ('A0', 'B1'), ('A0', 'B2'), ('A1', 'B0'), ('A1', 'B1'), ('A1', 'B2'), ('A2', 'B0'), ('A2', 'B1'), ('A2', 'B2'), ('A0', 'C0'), ('A0', 'C1'), ('A0', 'C2'), ('A1', 'C0'), ('A1', 'C1'), ('A1', 'C2'), ('A2', 'C0'), ('A2', 'C1'), ('A2', 'C2'), ('C0', 'B0'), ('C0', 'B1'), ('C0', 'B2'), ('C1', 'B0'), ('C1', 'B1'), ('C1', 'B2'), ('C2', 'B0'), ('C2', 'B1'), ('C2', 'B2'), ('C0', 'C1'), ('C0', 'C2'), ('C1', 'C2')]
like image 124
Abhijit Avatar answered Oct 12 '22 20:10

Abhijit