I have a list of tuples which I create dynamically.
The list appears as:
List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
Each tuple (a, b)
of list represents the range of indexes from a certain table.
The ranges (a, b) and (b, d)
is same in my situation as (a, d)
I want to merge the tuples where the 2nd element matches the first of any other.
So, in the example above, I want to merge (8, 10), (10,13)
to obtain (8,13)
and remove (8, 10), (10,13)
(19,25) and (25,30)
merge should yield (19, 30)
I don't have a clue where to start. The tuples are non overlapping.
Edit: I have been trying to just avoid any kind of for loop as I have a pretty large list
If you need to take into account things like skovorodkin's example in the comment,
[(1, 4), (4, 8), (8, 10)]
(or even more complex examples), then one way to do efficiently would be using graphs.
Say you create a digraph (possibly using networkx
), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now run topological sort, iterate according to the order, and merge accordingly. You should take care to handle nodes with two (or more) outgoing edges properly.
I realize your question states you'd like to avoid loops on account of the long list size. Conversely, for long lists, I doubt you'll find even an efficient linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.
Here is a possible implementation:
Say we start with
l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]
It simplifies the following to remove duplicates, so let's do:
l = list(set(l))
Now to build the digraph:
import networkx as nx
import collections
g = nx.DiGraph()
The vertices are simply the pairs:
g.add_nodes_from(l)
To build the edges, we need a dictionary:
froms = collections.defaultdict(list)
for p in l:
froms[p[0]].append(p)
Now we can add the edges:
for p in l:
for from_p in froms[p[1]]:
g.add_edge(p, from_p)
Next two lines are unneeded - they're just here to show what the graph looks like at this point:
>>> g.nodes()
[(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)]
>>> g.edges()
[((8, 10), (10, 13)), ((19, 25), (25, 30))]
Now, let's sort the pairs by topological sort:
l = nx.topological_sort(g)
Finally, here's the tricky part. The result will be a DAG. We have to to traverse things recursively, but remember what we visited already.
Let's create a dict of what we visited:
visited = {p: False for p in l}
Now a recursive function, that given a node, returns the maximum range edge from any node reachable from it:
def visit(p):
neighbs = g.neighbors(p)
if visited[p] or not neighbs:
visited[p] = True
return p[1]
mx = max([visit(neighb_p) for neighb_p in neighbs])
visited[p] = True
return mx
We're all ready. Let's create a list for the final pairs:
final_l = []
and visit all nodes:
for p in l:
if visited[p]:
continue
final_l.append((p[0], visit(p)))
Here's the final result:
>>> final_l
[(1, 4), (8, 13), (14, 16)]
If they don't overlap, then you can sort them, and then just combine adjacent ones.
Here's a generator that yields the new tuples:
def combine_ranges(L):
L = sorted(L) # Make a copy as we're going to remove items!
while L:
start, end = L.pop(0) # Get the first item
while L and L[0][0] == end:
# While the first of the rest connects to it, adjust
# the end and remove the first of the rest
_, end = L.pop(0)
yield (start, end)
print(list(combine_ranges(List)))
If speed is important, use a collections.deque
instead of a list, so that the .pop(0)
operations can be in constant speed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With