Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I detect common elements lists and groupe lists with at least 1 common element?

I have a Dataframe with 1 column (+the index) containing lists of sublists or elements. I would like to detect common elements in the lists/sublists and group the lists with at least 1 common element in order to have only lists of elements without any common elements. The lists/sublists are currently like this (exemple for 4 rows):

                 Num_ID
Row1   [['A1','A2','A3'],['A1','B1','B2','C3','D1']]`

Row2   ['A1','E2','E3']

Row3   [['B4','B5','G4'],['B6','B4']]

Row4   ['B4','C9']

n lists with no common elements (example for the first 2):

['A1','A2','A3','B1','B2','C3','D1','E2','E3']
['B4','B5','B6','C9','G4']
like image 980
Jon1 Avatar asked Jun 20 '19 10:06

Jon1


People also ask

How do you know if two lists have common elements?

Using sets Another approach to find, if two lists have common elements is to use sets. The sets have unordered collection of unique elements. So we convert the lists into sets and then create a new set by combining the given sets. If they have some common elements then the new set will not be empty.

How do you find the common element in multiple arrays in Python?

We can also apply the reduce function in python. This function is used to apply a given function passed onto it as argument to all of the list elements mentioned in the sequence passed along. The lambda function finds out the common elements by iterating through each nested list after set is applied to them .


1 Answers

You can use NetworkX's connected_components method for this. Here's how I'd approach this adapting this solution:

import networkx as nx
from itertools import combinations, chain

df= pd.DataFrame({'Num_ID':[[['A1','A2','A3'],['A1','B1','B2','C3','D1']], 
                            ['A1','E2','E3'], 
                            [['B4','B5','G4'],['B6','B4']], 
                            ['B4','C9']]})

Start by flattening the sublists in each list:

L = [[*chain.from_iterable(i)] if isinstance(i[0], list) else i 
       for i in df.Num_ID.values.tolist()]

[['A1', 'A2', 'A3', 'A1', 'B1', 'B2', 'C3', 'D1'],
 ['A1', 'E2', 'E3'],
 ['B4', 'B5', 'G4', 'B6', 'B4'],
 ['B4', 'C9']]

Given that the lists/sublists have more than 2 elements, you can get all the length 2 combinations from each sublist and use these as the network edges (note that edges can only connect two nodes):

L2_nested = [list(combinations(l,2)) for l in L]
L2 = list(chain.from_iterable(L2_nested))

Generate a graph, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:

G=nx.Graph()
G.add_edges_from(L2)
list(nx.connected_components(G))

[{'A1', 'A2', 'A3', 'B1', 'B2', 'C3', 'D1', 'E2', 'E3'},
 {'B4', 'B5', 'B6', 'C9', 'G4'}]
like image 192
yatu Avatar answered Oct 31 '22 17:10

yatu