Numpy finding element index in another array

Tags:

I have an array/set with unique positive integers, i.e.

>>> unique = np.unique(np.random.choice(100, 4, replace=False))

And an array containing multiple elements sampled from this previous array, such as

>>> A = np.random.choice(unique, 100)

I want to map the values of the array A to the position of which those values occur in unique.

So far the best solution I found is through a mapping array:

>>> table = np.zeros(unique.max()+1, unique.dtype)
>>> table[unique] = np.arange(unique.size)

The above assigns to each element the index on the array, and thus, can be used later to map A through advanced indexing:

>>> table[A]
array([2, 2, 3, 3, 3, 3, 1, 1, 1, 0, 2, 0, 1, 0, 2, 1, 0, 0, 2, 3, 0, 0, 0,
       0, 3, 3, 2, 1, 0, 0, 0, 2, 1, 0, 3, 0, 1, 3, 0, 1, 2, 3, 3, 3, 3, 1,
       3, 0, 1, 2, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 3, 1, 1, 2, 0, 0, 2, 0, 2,
       3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 1, 0, 1, 2, 0, 2, 0, 1, 3, 0, 2, 0, 1,
       3, 2, 2, 1, 3, 0, 3, 3], dtype=int32)

Which already gives me the proper solution. However, if the unique numbers in unique are very sparse and large, this approach implies creating a very large table array just to store a few numbers for later mapping.

Is there any better solution?

NOTE: both A and unique are sample arrays, not real arrays. So the question is not how to generate positional indexes, it is just how to efficiently map elements of A to indexes in unique, the pseudocode of what I'd like to speedup in numpy is as follows,

B = np.zeros_like(A)
for i in range(A.size):
    B[i] = unique.index(A[i])

(assuming unique is a list in the above pseudocode).

676

asked May 26 '16 14:05

Imanol Luengo

3 Answers

The table approach described in your question is the best option when unique if pretty dense, but unique.searchsorted(A) should produce the same result and doesn't require unique to be dense. searchsorted is great with ints, if anyone is trying to do this kind of thing with floats which have precision limitations, consider something like this.

164

answered Oct 19 '22 01:10

Bi Rico

You can use standard python dict with np.vectorize

inds = {e:i for i, e in enumerate(unique)}
B = np.vectorize(inds.get)(A)

answered Oct 19 '22 02:10

hilberts_drinking_problem

The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index, which does not require memory proportional to the max element, but only proportional to the input itself:

import numpy_indexed as npi
npi.indices(unique, A)

Note that it also works for arbitrary dtypes and dimensions. Also, the array being queried does not need to be unique; the first index encountered will be returned, the same as for list.

answered Oct 19 '22 00:10

Eelco Hoogendoorn

Related questions
                            
                                Python Insert Image into the middle of an existing PowerPoint
                            
                                What is the difference between the title() method and wm_title() method in the Tkinter class?
                            
                                Tkinter TTK Button Bold Font
                            
                                Unexpected Behavior of itertools.groupby
                            
                                Flask application on uwsgi gives a TypeError: 'Flask' object is not iterable
                            
                                how to remove a object in a python list
                            
                                ScrapyJS - How to properly wait for page load?
                            
                                What is the difference between an S3 Object and an ObjectSummary?
                            
                                Explicit passing of Self when calling super class's __init__ in python
                            
                                Installing imutils in ubuntu
                            
                                Plotting with SymPy
                            
                                Cumulative operations on dtype objects
                            
                                Django - Filter a date within a range with validation
                            
                                Convert a Haskell code to Python or pseudocode
                            
                                FFT in numpy vs FFT in MATLAB do not have the same results
                            
                                Array of ints in numba
                            
                                numpy: How can I select specific indexes in an np array for k-fold cross validation?
                            
                                How can I read in a binary file from hdfs into a Spark dataframe?
                            
                                different colors for rows in barh chart from pandas dataframe python
                            
                                Remove Action Bar Icon Kivy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy finding element index in another array

Tags:

python

arrays

indexing

numpy