I need to concatenate arrays but also merge the end of A with the start of B if they are overlapping.
[1, 2, 4] + [2, 4, 5] -> [1, 2, 4, 5]
[1, 2, 4] + [2, 5, 4] -> [1, 2, 4, 2, 5, 4]
[1, 2, 4] + [1, 2, 4, 5] -> [1, 2, 4, 5]
Note: Order of elements must be preserved, [4, 5] is not the same as [5, 4].
Note 2: The question can be understood like this too: We need the shortest possible extension of A such that the output ends with B.
Of course I can iterate over the second array and compare element-by element, but I am looking for a nice Numpy solution.
Originally misunderstood the problem. The problem, is from my understanding:
Two item suffix of A matches 2 item prefix of B:
[1, 2, 4] +
[2, 4, 5] =>
[1, 2, 4, 5]
No suffix of A matches a prefix of B:
[1, 2, 4] +
[2, 5, 4] ->
[1, 2, 4, 2, 5, 4]
Then we can use this terribly inefficient function:
def merge(A,B):
i = 0
m = 0
# Find largest suffix of A that matches the prefix of B with the same length
while i <= len(A):
if A[-i:] == B[:i] and i > m:
m = i
i += 1
return A + B[m:]
Below is a solution using NumPy. It's not ideal, since it requires a (possibly unneeded) sort, and an iteration. Both the sorting and iteration should be over a relatively small array (or even a single element).
import numpy as np
def merge(left, right):
"""Concatenating two arrays, merging the overlapping end and start of
the left and right array"""
# We can limit the search to the maximum possible overlap between
# the arrays, which is the minimum of the two lengths
l = min(len(left), len(right))
# Find all indices in `right` where the element matches the last element of `left`.
# Need to sort, since the `nonzero` documentation doesn't
# explicitly state whether the returned indices follow the order
# as in `right`
# As long as there are few matches, sorting will not be a showstopper
# Need to reverse the sorted array, to start from the back of the
# right array, work towards the front, until there is a proper match
for i in np.sort(np.nonzero(right[:l] == left[-1])[0])[::-1]:
# Check if the subarrays are equal
if np.all(left[-i-1:] == right[:i+1]):
return np.concatenate([left, right[i+1:]])
# No match
return np.concatenate([left, right])
a = np.array([1, 2, 4])
b = np.array([2, 4, 5])
c = np.array([2, 5, 4])
d = np.array([1, 2, 4, 5])
e = np.array([1, 2, 4, 2])
f = np.array([2, 4, 2, 5])
print(merge(a, b))
print(merge(a, c))
print(merge(a, d))
print(merge(e, b))
print(merge(e, f))
which yields
[1 2 4 5]
[1 2 4 2 5 4]
[1 2 4 5]
[1 2 4 2 4 5]
[1 2 4 2 5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With