Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i check that a list is in my array in python

for example if i have:

import numpy as np
A = np.array([[2,3,4],[5,6,7]])

and i want to check if the following list is the same as one of the lists that the array consist of:

B = [2,3,4]

I tried

B in A #which returns True

But the following also returns True, which should be false:

B = [2,2,2]
B in A
like image 725
Magnus Keinicke Avatar asked Oct 29 '18 15:10

Magnus Keinicke


2 Answers

Try this generator comprehension. The builtin any() short-circuits so that you don't have extra evaluations that you don't need.

any(np.array_equal(row, B) for row in A)

For now, np.array_equal doesn't implement internal short-circuiting. In a different question the performance impact of different ways of accomplishing this is discussed.

As @Dan mentions below, broadcasting is another valid way to solve this problem, and it's often (though not always) a better way. For some rough heuristics, here's how you might want to choose between the two approaches. As with any other micro-optimization, benchmark your results.

Generator Comprehension

  • Reduced memory footprint (not creating the array B==A)
  • Short-circuiting (if the first row of A is B, we don't have to look at the rest)
  • When rows are large (definition depends on your system, but could be ~100 - 100,000), broadcasting isn't noticeably faster.
  • Uses builtin language features. You have numpy installed anyway, but I'm partial to using the core language when there isn't a reason to do otherwise.

Broadcasting

  • Fastest way to solve an extremely broad range of problems using numpy. Using it here is good practice.
  • If we do have to search through every row in A (i.e. if more often than not we expect B to not be in A), broadcasting will almost always be faster (not always a lot faster necessarily, see next point)
  • When rows are smallish, the generator expression won't be able to vectorize the computations efficiently, so broadcasting will be substantially faster (unless of course you have enough rows that short-circuiting outweighs that concern).
  • In a broader context where you have more numpy code, the use of broadcasting here can help to have more consistent patterns in your code base. Coworkers and future you will appreciate not having a mix of coding styles and patterns.
like image 135
Hans Musgrave Avatar answered Oct 27 '22 01:10

Hans Musgrave


You can do it by using broadcasting like this:

import numpy as np
A = np.array([[2,3,4],[5,6,7]])
B = np.array([2,3,4]) # Or [2,3,4], a list will work fine here too

(B==A).all(axis=1).any()
like image 28
Dan Avatar answered Oct 27 '22 00:10

Dan