Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace list of list with "condensed" list of list while maintaining order

Tags:

I have a list of list as in the code I attached. I want to link each sub list if there are any common values. I then want to replace the list of list with a condensed list of list. Examples: if I have a list [[1,2,3],[3,4]] I want [1,2,3,4]. If I have [[4,3],[1,2,3]] I want [4,3,1,2]. If I have [[1,2,3],[a,b],[3,4],[b,c]] I want [[1,2,3,4],[a,b,c]] or [[a,b,c],[1,2,3,4]] I don't care which one.

I am almost there...

My problem is when I have a case like [[1,2,3],[10,5],[3,8,5]] I want [1,2,3,10,5,8] but with my current code I get [1,2,3,8,10,5]

Here is my code:

import itertools  a = [1,2,3] b = [3,4] i = [21,22] c = [88,7,8] e = [5,4] d = [3, 50] f = [8,9] g=  [9,10] h = [20,21]  lst = [a,b,c,i,e,d,f,g,h,a,c,i]*1000   #I have a lot of list but not very many different lists  def any_overlap(a, b):   sb = set(b)   return any(itertools.imap(sb.__contains__, a))  def find_uniq(lst):     ''' return the uniq parts of lst'''     seen = set()     seen_add = seen.add     return [ x for x in lst if x not in seen and not seen_add(x)]  def overlap_inlist(o_lst, lstoflst):     '''     Search for overlap, using "any_overlap", of a list( o_lst) in a list of lists (lstoflst).     If there is overlap add the uniq part of the found list to the search list, and keep      track of where that list was found      '''     used_lst =[ ]     n_lst =[ ]     for lst_num, each_lst in enumerate(lstoflst):         if any_overlap(o_lst,each_lst):             n_lst.extend(each_lst)             used_lst.append(lst_num)     n_lst= find_uniq(n_lst)     return  n_lst, used_lst  def comb_list(lst):     '''     For each list in a list of list find all the overlaps using 'ovelap_inlist'.     Update the list each time to delete the found lists. Return the final combined list.      '''     for updated_lst in lst:         n_lst, used_lst = overlap_inlist(updated_lst,lst)         lst[:] = [x for i,x in enumerate(lst) if i not in used_lst]         lst.insert(0,n_lst)     return lst comb_lst = comb_list(lst) print comb_lst 

The out put from this script is:

[[88, 7, 8, 9, 10], [1, 2, 3, 4, 50, 5], [21, 22, 20]] 

I want it so the key are in the original order like:

[[88, 7, 8, 9, 10], [1, 2, 3, 4, 5, 50,], [21, 22, 20]] 

The 5 and 50 are switched in new lst[2]

I am somewhat new to python. I would appreciate any solutions to the problem or comments on my current code. I am not a computer scientists, I imagine there may be some kind of algorithm that does this quickly( maybe from set theory? ). If there is such an algorithm please point me to the right direction.

I may be making this way more complicated then it is... Thank you!!

like image 883
Keith Avatar asked Dec 05 '12 01:12

Keith


People also ask

Does list append preserve order?

Yes, the order of elements in a python list is persistent.

Can you store a list inside another list?

We can have a list of many types in Python, like strings, numbers, and more. Python also allows us to have a list within a list called a nested list or a two-dimensional list.

Can we store list in list in Python?

Lists can contain any Python object, including lists (i.e., list of lists). Lists are indexed and sliced with square brackets (e.g., list[0] and list[2:9]), in the same way as strings and arrays.


1 Answers

Here's a brute-force approach (it might be easier to understand):

from itertools import chain  def condense(*lists):     # remember original positions     positions = {}     for pos, item in enumerate(chain(*lists)):         if item not in positions:             positions[item] = pos      # condense disregarding order     sets = condense_sets(map(set, lists))      # restore order     result = [sorted(s, key=positions.get) for s in sets]     return result if len(result) != 1 else result[0]  def condense_sets(sets):     result = []     for candidate in sets:         for current in result:             if candidate & current:   # found overlap                 current |= candidate  # combine (merge sets)                  # new items from candidate may create an overlap                 # between current set and the remaining result sets                 result = condense_sets(result) # merge such sets                 break         else:  # no common elements found (or result is empty)             result.append(candidate)     return result 

Example

>>> condense([1,2,3], [10,5], [3,8,5]) [1, 2, 3, 10, 5, 8] >>> a = [1,2,3] >>> b = [3,4] >>> i = [21,22] >>> c = [88,7,8] >>> e = [5,4] >>> d = [3, 50] >>> f = [8,9] >>> g=  [9,10] >>> h = [20,21] >>> condense(*[a,b,c,i,e,d,f,g,h,a,c,i]*1000) [[1, 2, 3, 4, 5, 50], [88, 7, 8, 9, 10], [21, 22, 20]] >>> condense([1], [2, 3, 2]) [[1], [2, 3]] 

Performance comparison

condense_*() functions are from the answers to this question. lst_OP input list from the question (different size), lst_BK - the test list from @Blckknght's answer (different size). See the source.

Measurements show that solutions based on "disjoint sets" and "connected components of undirected graph" concepts perform similar on both types of input.

name                       time ratio comment condense_hynekcer     5.79 msec  1.00 lst_OP condense_hynekcer2     7.4 msec  1.28 lst_OP condense_pillmuncher2 11.5 msec  1.99 lst_OP condense_blckknght    16.8 msec  2.91 lst_OP condense_jfs            26 msec  4.49 lst_OP condense_BK           30.5 msec  5.28 lst_OP condense_blckknght2   30.9 msec  5.34 lst_OP condense_blckknght3   32.5 msec  5.62 lst_OP   name                       time  ratio comment condense_blckknght     964 usec   1.00 lst_BK condense_blckknght2   1.41 msec   1.47 lst_BK condense_blckknght3   1.46 msec   1.51 lst_BK condense_hynekcer2    1.95 msec   2.03 lst_BK condense_pillmuncher2  2.1 msec   2.18 lst_BK condense_hynekcer     2.12 msec   2.20 lst_BK condense_BK           3.39 msec   3.52 lst_BK condense_jfs           544 msec 563.66 lst_BK   name                       time ratio comment condense_hynekcer     6.86 msec  1.00 lst_rnd condense_jfs          16.8 msec  2.44 lst_rnd condense_blckknght    28.6 msec  4.16 lst_rnd condense_blckknght2   56.1 msec  8.18 lst_rnd condense_blckknght3   56.3 msec  8.21 lst_rnd condense_BK           70.2 msec 10.23 lst_rnd condense_pillmuncher2  324 msec 47.22 lst_rnd condense_hynekcer2     334 msec 48.61 lst_rnd 

To reproduce results clone gist and run measure-performance-condense-lists.py

like image 148
jfs Avatar answered Oct 15 '22 08:10

jfs