I use Python with numpy
.
I have a numpy array of indexes a
:
>>> a
array([[5, 7],
[12, 18],
[20, 29]])
>>> type(a)
<type 'numpy.ndarray'>
I have a numpy array of indexes b
:
>>> b
array([[2, 4],
[8, 11],
[33, 35]])
>>> type(b)
<type 'numpy.ndarray'>
I need to join an array a
with an array b
:
a
+ b
=> [2, 4] [5, 7] [8, 11] [12, 18] [20, 29] [33, 35]
=> a
and b
there are arrays of indexes => [2, 18] [20, 29] [33, 35]
( indexes ([2, 4][5, 7][8, 11][12, 18])
go sequentially
=> 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
=> [2, 18]
)
For this example:
>>> out_c
array([[2, 18],
[20, 29],
[33, 35]])
Can someone please suggest, how do I get out_c
?
Update: @Geoff suggested solution python union of multiple ranges. Whether this solution the fastest and best in large data arrays?
ranges = np.vstack((a,b))
ranges.sort(0)
# List of non-overlapping ranges
nonoverlapping = (ranges[1:,0] - ranges[:-1,1] > 1).nonzero()[0]
# Starts are 0, and all the starts not overlapped by their predecessor
starts = np.hstack(([0], nonoverlapping + 1))
# Ends are -1 and all the ends who aren't overlapped by their successor
ends = np.hstack(( nonoverlapping, [-1]))
# Result
result = np.vstack((ranges[starts, 0], ranges[ends, 1])).T
(Old answer) Using lists and sets
import numpy as np
import itertools
def ranges(s):
""" Converts a list of integers into start, end pairs """
for a, b in itertools.groupby(enumerate(s), lambda(x, y): y - x):
b = list(b)
yield b[0][1], b[-1][1]
def intersect(*args):
""" Converts any number of numpy arrays containing start, end pairs
into a set of indexes """
s = set()
for start, end in np.vstack(args):
s = s | set(range(start,end+1))
return s
a = np.array([[5,7],[12, 18],[20,29]])
b = np.array([[2,4],[8,11],[33,35]])
result = np.array(list(ranges(intersect(a,b))))
Not pretty, but it works. I don't like the final loop, buy couldn't think of a way of doing without it:
ab = np.vstack((a,b))
ab.sort(axis=0)
join_with_next = ab[1:, 0] - ab[:-1, 1] <= 1
endpoints = np.concatenate(([0],
np.where(np.diff(join_with_next) == True)[0] + 2,
[len(ab,)]))
lengths = np.diff(endpoints)
new_lengths = lengths.copy()
if join_with_next[0] == True:
new_lengths[::2] = 1
else:
new_lengths[1::2] = 1
new_endpoints = np.concatenate(([0], np.cumsum(new_lengths)))
print endpoints, lengths
print new_endpoints, new_lengths
starts = endpoints[:-1]
ends = endpoints[1:]
new_starts = new_endpoints[:-1]
new_ends = new_endpoints[1:]
c = np.empty((new_endpoints[-1], 2), dtype=ab.dtype)
for j, (s,e,ns,ne) in enumerate(zip(starts, ends, new_starts, new_ends)):
if e-s != ne-ns:
c[ns:ne] = np.array([np.min(ab[s:e, 0]), np.max(ab[s:e, 1])])
else:
c[ns:ne] = ab[s:e]
>>> c
array([[ 2, 18],
[20, 29],
[33, 35]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With