How to select inverse of indexes of a numpy array?

Tags:

I have a large set of data in which I need to compare the distances of a set of samples from this array with all the other elements of the array. Below is a very simple example of my data set.

Click to copy

import numpy as np import scipy.spatial.distance as sd  data = np.array(     [[ 0.93825827,  0.26701143],      [ 0.99121108,  0.35582816],      [ 0.90154837,  0.86254049],      [ 0.83149103,  0.42222948],      [ 0.27309625,  0.38925281],      [ 0.06510739,  0.58445673],      [ 0.61469637,  0.05420098],      [ 0.92685408,  0.62715114],      [ 0.22587817,  0.56819403],      [ 0.28400409,  0.21112043]] )   sample_indexes = [1,2,3]  # I'd rather not make this other_indexes = list(set(range(len(data))) - set(sample_indexes))  sample_data = data[sample_indexes] other_data = data[other_indexes]  # compare them dists = sd.cdist(sample_data, other_data)

Is there a way to index a numpy array for indexes that are NOT the sample indexes? In my above example I make a list called other_indexes. I'd rather not have to do this for various reasons (large data set, threading, a very VERY low amount of memory on the system this is running on etc. etc. etc.). Is there a way to do something like..

Click to copy

other_data = data[ indexes not in sample_indexes]

I read that numpy masks can do this but I tried...

Click to copy

other_data = data[~sample_indexes]

And this gives me an error. Do I have to create a mask?

262

asked Aug 15 '14 17:08

b10hazard

2 Answers

Click to copy

mask = np.ones(len(data), np.bool) mask[sample_indexes] = 0 other_data = data[mask]

not the most elegant for what perhaps should be a single-line statement, but its fairly efficient, and the memory overhead is minimal too.

If memory is your prime concern, np.delete would avoid the creation of the mask, and fancy-indexing creates a copy anyway.

On second thought; np.delete does not modify the existing array, so its pretty much exactly the single line statement you are looking for.

122

answered Sep 17 '22 04:09

Eelco Hoogendoorn

You may want to try in1d

Click to copy

In [5]:  select = np.in1d(range(data.shape[0]), sample_indexes) In [6]:  print data[select] [[ 0.99121108  0.35582816]  [ 0.90154837  0.86254049]  [ 0.83149103  0.42222948]] In [7]:  print data[~select] [[ 0.93825827  0.26701143]  [ 0.27309625  0.38925281]  [ 0.06510739  0.58445673]  [ 0.61469637  0.05420098]  [ 0.92685408  0.62715114]  [ 0.22587817  0.56819403]  [ 0.28400409  0.21112043]]

answered Sep 20 '22 04:09

CT Zhu

Related questions
                            
                                Django get_or_create fails to set field when used with iexact
                            
                                Pandas rolling gives NaN
                            
                                Generate random UTF-8 string in Python
                            
                                What should people new to Python know about its community and ecosystem? [closed]
                            
                                Modify default queryset in django
                            
                                Django unique_together not preventing duplicates
                            
                                Django REST Framework CSRF Failed: CSRF cookie not set
                            
                                Running Python in PowerShell?
                            
                                How do you index on a jinja template?
                            
                                How to reset a DataFrame's indexes for all groups in one step?
                            
                                Python 'map' function inserting NaN, possible to return original values instead?
                            
                                Getting full tweet text from "user_timeline" with tweepy
                            
                                Python Pathlib path object not converting to string [duplicate]
                            
                                Why is Apache-Spark - Python so slow locally as compared to pandas?
                            
                                Proper relative imports: "Unable to import module"
                            
                                Calling functions by array index in Python
                            
                                Get group id back into pandas dataframe
                            
                                Python: A4 size for a plot
                            
                                Is there a reason Python 3 enumerates slower than Python 2?
                            
                                Adjusting Text background transparency

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to select inverse of indexes of a numpy array?

Tags:

python

numpy

scipy

b10hazard

People also ask

2 Answers

Eelco Hoogendoorn

CT Zhu

Recent Activity

Donate For Us