given two general numpy 1-d arrays (no guarantees about values whatsoever), I need to check if one is a sub-array of the other.
It's short and easy to do by casting to strings, but probably not the most efficient:
import numpy as np
def is_sub_arr(a1, a2):
return str(a2).strip('[]') in str(a1).strip('[]')
arr1 = np.array([9, 1, 3, 2, 7, 2, 7, 2, 8, 5])
arr2 = np.array([3, 2, 7, 2])
arr3 = np.array([1,3,7])
print(is_sub_arr(arr1,arr2)) # True
print(is_sub_arr(arr1,arr3)) # False
is there an efficient builtin/native numpy way to do this?
Simple Approach: A simple approach is to run two nested loops and generate all subarrays of the array A[] and use one more loop to check if any of the subarray of A[] is equal to the array B[]. Efficient Approach : An efficient approach is to use two pointers to traverse both the array simultaneously.
Use numpy. isin() to find the elements of a array belongs to another array or not. it returns a boolean array matching the shape of other array where elements are to be searched. numpy.
EDIT: You can also make things a lot faster (like 1000x) with less memory using Numba:
import numpy as np
import numba as nb
def is_sub_arr_np(a1, a2):
l1, = a1.shape
s1, = a1.strides
l2, = a2.shape
a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
return np.any(np.all(a1_win == a2, axis=1))
@nb.jit(parallel=True)
def is_sub_arr_nb(a1, a2):
for i in nb.prange(len(a1) - len(a2) + 1):
for j in range(len(a2)):
if a1[i + j] != a2[j]:
break
else:
return True
return False
# Test
np.random.seed(0)
arr1 = np.random.randint(100, size=100_000)
arr2 = np.random.randint(100, size=1_000)
print(is_sub_arr_np(arr1, arr2))
# False
print(is_sub_arr_nb(arr1, arr2))
# False
# Now enforce a match at the end
arr1[-len(arr2):] = arr2
print(is_sub_arr_np(arr1, arr2))
# True
print(is_sub_arr_nb(arr1, arr2))
# True
# Timing
%timeit is_sub_arr_np(arr1, arr2)
# 99.4 ms ± 567 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit is_sub_arr_nb(arr1, arr2)
# 124 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Not sure this is the most efficient answer but it is one possible solution:
import numpy as np
def is_sub_arr(a1, a2):
l1, = a1.shape
s1, = a1.strides
l2, = a2.shape
a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
return np.any(np.all(a1_win == a2, axis=1))
arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])
print(is_sub_arr(arr1, arr2))
# True
print(is_sub_arr(arr1, arr3))
# False
Although there is already an accepted answer, I'd like to throw in my quick and dirty solution:
memoryview(arr2).tobytes() in memoryview(arr1).tobytes()
This of course only works if the arrays use contiguous memory, so the full function is:
def is_sub_arr_mem(a, sub):
if a.flags['FORC'] and sub.flags['FORC']:
return memoryview(sub).tobytes() in memoryview(a).tobytes()
return None
This is way faster than the short and easy string conversion in the OP and also faster than the (non-numba accelerated) stride and the cut solutions for different array sizes:
Original data (n1 = 10, n2 = 4)
str: 0.142 ms
stride: 0.034 ms
cut: 0.014 ms
mem: 0.003 ms
n1 = 1000, n2 = 4
str: 3.072 ms
stride: 0.100 ms
cut: 0.017 ms
mem: 0.008 ms
n1 = 1000, n2 = 500
str: 4.543 ms
stride: 0.339 ms
cut: 0.018 ms
mem: 0.009 ms
(n1 and n2 are the sizes of arr1 and arr2, respectively)
You can try something like this:
import numpy as np
arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 20, 67, -1])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])
def is_sub_arr(original, sub):
first_match = np.where(original == sub[0])
if len(first_match) == 0:
print("no matches")
return
else:
for match in first_match[0]:
cut_original = original[match:match + len(sub)]
if np.all(cut_original == sub):
print("Got a match at index ", match)
else:
print("no match")
return
is_sub_arr(arr1, arr2)
is_sub_arr(arr1, arr3)
First, we check if the first element of the subarray is in the original array and get its index with np.where
. Then, for every match, we cut the original array starting at that index and ending at that index plus the length of the sub, and check if this matches the subarray itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With