Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Organizing list of tuples

I have a list of tuples which I create dynamically.

The list appears as:

List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]

Each tuple (a, b) of list represents the range of indexes from a certain table.

The ranges (a, b) and (b, d) is same in my situation as (a, d)

I want to merge the tuples where the 2nd element matches the first of any other.

So, in the example above, I want to merge (8, 10), (10,13) to obtain (8,13) and remove (8, 10), (10,13)

(19,25) and (25,30) merge should yield (19, 30)

I don't have a clue where to start. The tuples are non overlapping.

Edit: I have been trying to just avoid any kind of for loop as I have a pretty large list

like image 666
Zanam Avatar asked Oct 06 '16 19:10

Zanam


2 Answers

If you need to take into account things like skovorodkin's example in the comment,

[(1, 4), (4, 8), (8, 10)]

(or even more complex examples), then one way to do efficiently would be using graphs.

Say you create a digraph (possibly using networkx), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now run topological sort, iterate according to the order, and merge accordingly. You should take care to handle nodes with two (or more) outgoing edges properly.


I realize your question states you'd like to avoid loops on account of the long list size. Conversely, for long lists, I doubt you'll find even an efficient linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.


Here is a possible implementation:

Say we start with

l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]

It simplifies the following to remove duplicates, so let's do:

l = list(set(l))

Now to build the digraph:

import networkx as nx
import collections

g = nx.DiGraph()

The vertices are simply the pairs:

g.add_nodes_from(l)

To build the edges, we need a dictionary:

froms = collections.defaultdict(list)
for p in l:
    froms[p[0]].append(p)

Now we can add the edges:

for p in l:
    for from_p in froms[p[1]]:
        g.add_edge(p, from_p)

Next two lines are unneeded - they're just here to show what the graph looks like at this point:

>>> g.nodes()
[(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)]

>>> g.edges()
[((8, 10), (10, 13)), ((19, 25), (25, 30))]

Now, let's sort the pairs by topological sort:

l = nx.topological_sort(g)

Finally, here's the tricky part. The result will be a DAG. We have to to traverse things recursively, but remember what we visited already.

Let's create a dict of what we visited:

visited = {p: False for p in l}

Now a recursive function, that given a node, returns the maximum range edge from any node reachable from it:

def visit(p):
    neighbs = g.neighbors(p)
    if visited[p] or not neighbs:
        visited[p] = True
        return p[1]
    mx = max([visit(neighb_p) for neighb_p in neighbs])
    visited[p] = True
    return mx

We're all ready. Let's create a list for the final pairs:

final_l = []

and visit all nodes:

for p in l:
    if visited[p]:
        continue
    final_l.append((p[0], visit(p)))

Here's the final result:

>>> final_l
[(1, 4), (8, 13), (14, 16)]
like image 185
Ami Tavory Avatar answered Oct 12 '22 11:10

Ami Tavory


If they don't overlap, then you can sort them, and then just combine adjacent ones.

Here's a generator that yields the new tuples:

def combine_ranges(L):
    L = sorted(L)  # Make a copy as we're going to remove items!
    while L:
        start, end = L.pop(0)  # Get the first item
        while L and L[0][0] == end:
            # While the first of the rest connects to it, adjust
            # the end and remove the first of the rest
            _, end = L.pop(0)
        yield (start, end)

print(list(combine_ranges(List)))

If speed is important, use a collections.deque instead of a list, so that the .pop(0) operations can be in constant speed.

like image 40
RemcoGerlich Avatar answered Oct 12 '22 10:10

RemcoGerlich