Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine lists with common elements

Say I have for example the following nested list:

L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
     ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]

How can I group these sublists, by getting the union of sublists which have a common element with at least another sublist within the group? So for the previous example the result should be:

[['John','Sayyed','Simon'] ,['bush','trump'],
 ['Sam','Suri','NewYork','Orlando','Canada']]

Thus the first two sublists are joined as they share 'John'. Could someone please share their valuable thoughts ?

like image 793
Laster Avatar asked Dec 21 '18 14:12

Laster


1 Answers

In many cases, modeling a problem as a graph, can make make fairly complicated tasks much easier. In this case, what we'd be looking for from a graph theory point of view, are the connected components of the graph.

So a simple way to go about this, is to generate a graph with NetworkX, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:

import networkx as nx 

L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump']]

G=nx.Graph()
G.add_edges_from(L)
list(nx.connected_components(G))

[{'John', 'Sayyed', 'Simon'}, {'bush', 'trump'}]

What about sublists with multiple (>2) items?

In the case of having sublists with more than 2 elements, you can add them as paths instead of nodes using nx.add_path, since they can connect multiple nodes:

L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
     ['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]

G=nx.Graph()
for l in L:
    nx.add_path(G, l)
list(nx.connected_components(G))

[{'John', 'Sayyed', 'Simon'},
 {'bush', 'trump'},
 {'Canada', 'NewYork', 'Orlando', 'Sam', 'Suri'}]

We can also vivisualize these connected components with nx.draw:

pos = nx.spring_layout(G, scale=20, k=2/np.sqrt(G.order()))
nx.draw(G, pos, node_color='lightgreen', node_size=1000, with_labels=True)

enter image description here


On connected components (graph theory)

More detailed explanation on connected components:

In graph theory, a connected component (or just component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph

So essentially, this code creates a graph, with edges from the list, where each edge is composed by two values u,v where u and v will be nodes connected by this edge.

And hence, the union of sublists with at least one sublist with a common element can be translated into a Graph Theory problem as all nodes that are reachable between each other through the existing paths.

like image 149
yatu Avatar answered Oct 20 '22 00:10

yatu