given two general numpy 1-d arrays (no guarantees about values whatsoever), I need to check if one is a sub-array of the other. It's short and easy to do by casting to strings, but probably not the most efficient: <pre class="prettyprint"><code>import numpy as np def is_sub_arr(a1, a2): return str(a2).strip('[]') in str(a1).strip('[]') arr1 = np.array([9, 1, 3, 2, 7, 2, 7, 2, 8, 5]) arr2 = np.array([3, 2, 7, 2]) arr3 = np.array([1,3,7]) print(is_sub_arr(arr1,arr2)) # True print(is_sub_arr(arr1,arr3)) # False </code></pre> is there an efficient builtin/native numpy way to do this?

Although there is already an accepted answer, I'd like to throw in my quick and dirty solution: <pre class="prettyprint"><code>memoryview(arr2).tobytes() in memoryview(arr1).tobytes() </code></pre> This of course only works if the arrays use contiguous memory, so the full function is: <pre class="prettyprint"><code>def is_sub_arr_mem(a, sub): if a.flags['FORC'] and sub.flags['FORC']: return memoryview(sub).tobytes() in memoryview(a).tobytes() return None </code></pre> This is way faster than the short and easy string conversion in the OP and also faster than the (non-numba accelerated) stride and the cut solutions for different array sizes: <pre class="prettyprint"><code>Original data (n1 = 10, n2 = 4) str: 0.142 ms stride: 0.034 ms cut: 0.014 ms mem: 0.003 ms n1 = 1000, n2 = 4 str: 3.072 ms stride: 0.100 ms cut: 0.017 ms mem: 0.008 ms n1 = 1000, n2 = 500 str: 4.543 ms stride: 0.339 ms cut: 0.018 ms mem: 0.009 ms </code></pre> (n1 and n2 are the sizes of arr1 and arr2, respectively)

Numpy: check if 1-d array is sub-array of another

Tags:

python

numpy

given two general numpy 1-d arrays (no guarantees about values whatsoever), I need to check if one is a sub-array of the other.

It's short and easy to do by casting to strings, but probably not the most efficient:

import numpy as np

def is_sub_arr(a1, a2):
    return str(a2).strip('[]') in str(a1).strip('[]')

arr1 = np.array([9, 1, 3, 2, 7, 2, 7, 2, 8, 5])
arr2 = np.array([3, 2, 7, 2])
arr3 = np.array([1,3,7])

print(is_sub_arr(arr1,arr2))  # True
print(is_sub_arr(arr1,arr3))  # False

is there an efficient builtin/native numpy way to do this?

371

asked Jul 12 '19 09:07

Adam.Er8

3 Answers

EDIT: You can also make things a lot faster (like 1000x) with less memory using Numba:

import numpy as np
import numba as nb

def is_sub_arr_np(a1, a2):
    l1, = a1.shape
    s1, = a1.strides
    l2, = a2.shape
    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
    return np.any(np.all(a1_win == a2, axis=1))

@nb.jit(parallel=True)
def is_sub_arr_nb(a1, a2):
    for i in nb.prange(len(a1) - len(a2) + 1):
        for j in range(len(a2)):
            if a1[i + j] != a2[j]:
                break
        else:
            return True
    return False

# Test
np.random.seed(0)
arr1 = np.random.randint(100, size=100_000)
arr2 = np.random.randint(100, size=1_000)
print(is_sub_arr_np(arr1, arr2))
# False
print(is_sub_arr_nb(arr1, arr2))
# False

# Now enforce a match at the end
arr1[-len(arr2):] = arr2
print(is_sub_arr_np(arr1, arr2))
# True
print(is_sub_arr_nb(arr1, arr2))
# True

# Timing
%timeit is_sub_arr_np(arr1, arr2)
# 99.4 ms ± 567 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit is_sub_arr_nb(arr1, arr2)
# 124 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Not sure this is the most efficient answer but it is one possible solution:

import numpy as np

def is_sub_arr(a1, a2):
    l1, = a1.shape
    s1, = a1.strides
    l2, = a2.shape
    a1_win = np.lib.stride_tricks.as_strided(a1, (l1 - l2 + 1, l2), (s1, s1))
    return np.any(np.all(a1_win == a2, axis=1))

arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])
print(is_sub_arr(arr1, arr2))
# True
print(is_sub_arr(arr1, arr3))
# False

177

answered Oct 22 '22 15:10

jdehesa

Although there is already an accepted answer, I'd like to throw in my quick and dirty solution:

memoryview(arr2).tobytes() in memoryview(arr1).tobytes()

This of course only works if the arrays use contiguous memory, so the full function is:

def is_sub_arr_mem(a, sub):
    if a.flags['FORC'] and sub.flags['FORC']: 
        return memoryview(sub).tobytes() in memoryview(a).tobytes()
    return None

This is way faster than the short and easy string conversion in the OP and also faster than the (non-numba accelerated) stride and the cut solutions for different array sizes:

Original data (n1 = 10, n2 = 4)
str:     0.142 ms
stride:  0.034 ms
cut:     0.014 ms
mem:     0.003 ms

n1 = 1000, n2 = 4
str:     3.072 ms
stride:  0.100 ms
cut:     0.017 ms
mem:     0.008 ms

n1 = 1000, n2 = 500
str:     4.543 ms
stride:  0.339 ms
cut:     0.018 ms
mem:     0.009 ms

(n1 and n2 are the sizes of arr1 and arr2, respectively)

answered Oct 22 '22 16:10

Stef

You can try something like this:

import numpy as np

arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 20, 67, -1])
arr2 = np.array([4, 5, 6])
arr3 = np.array([4, 5, 7])


def is_sub_arr(original, sub):
    first_match = np.where(original == sub[0])
    if len(first_match) == 0:
        print("no matches")
        return

    else:
        for match in first_match[0]:

            cut_original = original[match:match + len(sub)]
            if np.all(cut_original == sub):
                print("Got a match at index ", match)
            else:
                print("no match")
        return


is_sub_arr(arr1, arr2)
is_sub_arr(arr1, arr3)

First, we check if the first element of the subarray is in the original array and get its index with np.where. Then, for every match, we cut the original array starting at that index and ending at that index plus the length of the sub, and check if this matches the subarray itself.

answered Oct 22 '22 15:10

neko

Related questions
                            
                                When will/won't Python suspend execution of a coroutine?
                            
                                Issue with plyer library of python when creating a executable using pyinstaller
                            
                                How to bin column of floats with pandas
                            
                                Is it possible to specify the max amount of time to wait for code to run?
                            
                                Concatenate with broadcast
                            
                                For loop iterates less times than I expected in Python
                            
                                How to extract links from a page using Beautiful soup
                            
                                Arguments of __new__ and __init__ for metaclasses
                            
                                Regex to match all words inside parenthesis
                            
                                Python Paramiko directory walk over SFTP
                            
                                unable to import Metric from tensorflow.keras.metrics
                            
                                Sometimes the eigenvectors calculated in python and mathematica have opposite signs anyone know why?
                            
                                Addng attachment to an EmailMessage raises TypeError: set_text_content() got an unexpected keyword argument 'maintype'
                            
                                How to expand / dilate a numpy array?
                            
                                Transpose a large array without loading into memory
                            
                                How to mock objects methods return value
                            
                                Unable to upload a file from sagemaker notebook to S3
                            
                                How to fix 'No module named vidcap' in python 3.7
                            
                                How to redirect while keeping form data using Flask and WTForms?
                            
                                How to extract coefficients from a line equation in python without using numpy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With