This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B. So, if <pre class="prettyprint"><code>A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6]) </code></pre> Currently I am using <pre class="prettyprint"><code>C = np.searchsorted(A,B) </code></pre> which takes advantage of the fact that <code>A</code> is in order, and gives me <code>[1, 3, 5]</code>, the indices of the elements that are in <code>A</code>. This is great, but how do I get <code>D = [0,2,4,6]</code>, the indices of elements of <code>A</code> that are not in <code>B</code>?

<code>searchsorted</code> may give you wrong answer if not every element of B is in A. You can use <code>numpy.in1d</code>: <pre class="prettyprint"><code>A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6,8]) mask = np.in1d(A, B) print np.where(mask)[0] print np.where(~mask)[0] </code></pre> output is: <pre class="prettyprint"><code>[1 3 5] [0 2 4 6] </code></pre> However <code>in1d()</code> uses sort, which is slow for large datasets. You can use pandas if your dataset is large: <pre class="prettyprint"><code>import pandas as pd np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0] </code></pre> Here is the time comparison: <pre class="prettyprint"><code>A = np.random.randint(0, 1000, 10000) B = np.random.randint(0, 1000, 10000) %timeit np.where(np.in1d(A, B))[0] %timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0] </code></pre> output: <pre class="prettyprint"><code>100 loops, best of 3: 2.09 ms per loop 1000 loops, best of 3: 594 µs per loop </code></pre>

Check if each element in a numpy array is in another array

Tags:

This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B.

So, if

A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6])

Currently I am using

C = np.searchsorted(A,B)

which takes advantage of the fact that A is in order, and gives me [1, 3, 5], the indices of the elements that are in A. This is great, but how do I get D = [0,2,4,6], the indices of elements of A that are not in B?

633

asked Apr 11 '13 02:04

DanHickstein

2 Answers

searchsorted may give you wrong answer if not every element of B is in A. You can use numpy.in1d:

A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6,8]) mask = np.in1d(A, B) print np.where(mask)[0] print np.where(~mask)[0]

output is:

[1 3 5] [0 2 4 6]

However in1d() uses sort, which is slow for large datasets. You can use pandas if your dataset is large:

import pandas as pd np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

Here is the time comparison:

A = np.random.randint(0, 1000, 10000) B = np.random.randint(0, 1000, 10000)  %timeit np.where(np.in1d(A, B))[0] %timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

output:

100 loops, best of 3: 2.09 ms per loop 1000 loops, best of 3: 594 µs per loop

answered Sep 17 '22 01:09

HYRY

import numpy as np  A = np.array([1,2,3,4,5,6,7]) B = np.array([2,4,6]) C = np.searchsorted(A, B)  D = np.delete(np.arange(np.alen(A)), C)  D #array([0, 2, 4, 6])

answered Sep 21 '22 01:09

askewchan

Related questions
                            
                                While scrolling on an iOS device, the z-index of elements isn't working
                            
                                How can I reset all devise sessions so every user has to login again?
                            
                                python get time stamp on file in mm/dd/yyyy format
                            
                                Criteria JPA 2 with 3 tables
                            
                                Is there a way to set a base request parameter to be included in every request made with Square's Retrofit library?
                            
                                How can I set a tab width for JSON files?
                            
                                Get the weekday from a Date object or date string using JavaScript
                            
                                Count items existing in 2 Lists
                            
                                Sublime Text "Unable to save"
                            
                                TFS build broken - The directory is not empty on sources directory
                            
                                ConfigurationManager.GetSection Gives Error "Could not load type....from assembly..."
                            
                                async/await keywords not available in .net 4.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With