I have an 3d array with shape (1000, 12, 30)
, and I have a list of 2d array's of shape (12, 30)
, what I want to do is check if these 2d arrays exist in the 3d array. Is there a simple way in Python to do this? I tried keyword in
but it doesn't work.
There is a way in numpy
, you can do with np.all
a = np.random.rand(3, 1, 2)
b = a[1][0]
np.all(np.all(a == b, 1), 1)
Out[612]: array([False, True, False])
Solution from bnaecker
np.all(a == b, axis=(1, 2))
If only want to check exit or not
np.any(np.all(a == b, axis=(1, 2)))
Here is a fast method (previously used by @DanielF as well as @jaime and others, no doubt) that uses a trick to benefit from short-circuiting: view-cast template-sized blocks to single elements of dtype void
. When comparing two such blocks numpy stops after the first difference, yielding a huge speed advantage.
>>> def in_(data, template):
... dv = data.reshape(data.shape[0], -1).view(f'V{data.dtype.itemsize*np.prod(data.shape[1:])}').ravel()
... tv = template.ravel().view(f'V{template.dtype.itemsize*template.size}').reshape(())
... return (dv==tv).any()
Example:
>>> a = np.random.randint(0, 100, (1000, 12, 30))
>>> check = a[np.random.randint(0, 1000, (10,))]
>>> check += np.random.random(check.shape) < 0.001
>>>
>>> [in_(a, c) for c in check]
[True, True, True, False, False, True, True, True, True, False]
# compare to other method
>>> (a==check[:, None]).all((-1,-2)).any(-1)
array([ True, True, True, False, False, True, True, True, True,
False])
Gives same result as "direct" numpy approach, but is almost 20x faster:
>>> from timeit import timeit
>>> kwds = dict(globals=globals(), number=100)
>>>
>>> timeit("(a==check[:, None]).all((-1,-2)).any(-1)", **kwds)
0.4793281531892717
>>> timeit("[in_(a, c) for c in check]", **kwds)
0.026218891143798828
Given
a = np.arange(12).reshape(3, 2, 2)
lst = [
np.arange(4).reshape(2, 2),
np.arange(4, 8).reshape(2, 2)
]
print(a, *lst, sep='\n{}\n'.format('-' * 20))
[[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
[[ 8 9]
[10 11]]]
--------------------
[[0 1]
[2 3]]
--------------------
[[4 5]
[6 7]]
Notice that lst
is a list of arrays as per OP. I'll make that a 3d array b
below.
Use broadcasting. Using the broadcasting rules. I want the dimensions of a
as (1, 3, 2, 2)
and b
as (2, 1, 2, 2)
.
b = np.array(lst)
x, *y = b.shape
c = np.equal(
a.reshape(1, *a.shape),
np.array(lst).reshape(x, 1, *y)
)
I'll use all
to produce a (2, 3)
array of truth values and np.where
to find out which among the a
and b
sub-arrays are actually equal.
i, j = np.where(c.all((-2, -1)))
This is just a verification that we achieved what we were after. We are supposed to observe that for each paired i
and j
values, the sub-arrays are actually the same.
for t in zip(i, j):
print(a[t[0]], b[t[1]], sep='\n\n')
print('------')
[[0 1]
[2 3]]
[[0 1]
[2 3]]
------
[[4 5]
[6 7]]
[[4 5]
[6 7]]
------
in
However, to complete OP's thought on using in
a_ = a.tolist()
list(filter(lambda x: x.tolist() in a_, lst))
[array([[0, 1],
[2, 3]]), array([[4, 5],
[6, 7]])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With