Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select inverse of indexes of a numpy array?

I have a large set of data in which I need to compare the distances of a set of samples from this array with all the other elements of the array. Below is a very simple example of my data set.

import numpy as np import scipy.spatial.distance as sd  data = np.array(     [[ 0.93825827,  0.26701143],      [ 0.99121108,  0.35582816],      [ 0.90154837,  0.86254049],      [ 0.83149103,  0.42222948],      [ 0.27309625,  0.38925281],      [ 0.06510739,  0.58445673],      [ 0.61469637,  0.05420098],      [ 0.92685408,  0.62715114],      [ 0.22587817,  0.56819403],      [ 0.28400409,  0.21112043]] )   sample_indexes = [1,2,3]  # I'd rather not make this other_indexes = list(set(range(len(data))) - set(sample_indexes))  sample_data = data[sample_indexes] other_data = data[other_indexes]  # compare them dists = sd.cdist(sample_data, other_data) 

Is there a way to index a numpy array for indexes that are NOT the sample indexes? In my above example I make a list called other_indexes. I'd rather not have to do this for various reasons (large data set, threading, a very VERY low amount of memory on the system this is running on etc. etc. etc.). Is there a way to do something like..

other_data = data[ indexes not in sample_indexes] 

I read that numpy masks can do this but I tried...

other_data = data[~sample_indexes] 

And this gives me an error. Do I have to create a mask?

like image 262
b10hazard Avatar asked Aug 15 '14 17:08

b10hazard


People also ask

How do I select a specific index in a NumPy array?

To select an element from Numpy Array , we can use [] operator i.e. It will return the element at given index only.

How do I remove a specific indices from a NumPy array?

To remove an element from a NumPy array: Specify the index of the element to remove. Call the numpy. delete() function on the array for the given index.

What is negative indexing in NumPy array?

Negative indices are interpreted as counting from the end of the array (i.e., if i < 0, it means n_i + i). All arrays generated by basic slicing are always views of the original array. The standard rules of sequence slicing apply to basic slicing on a per-dimension basis (including using a step index).


2 Answers

mask = np.ones(len(data), np.bool) mask[sample_indexes] = 0 other_data = data[mask] 

not the most elegant for what perhaps should be a single-line statement, but its fairly efficient, and the memory overhead is minimal too.

If memory is your prime concern, np.delete would avoid the creation of the mask, and fancy-indexing creates a copy anyway.

On second thought; np.delete does not modify the existing array, so its pretty much exactly the single line statement you are looking for.

like image 122
Eelco Hoogendoorn Avatar answered Sep 17 '22 04:09

Eelco Hoogendoorn


You may want to try in1d

In [5]:  select = np.in1d(range(data.shape[0]), sample_indexes) In [6]:  print data[select] [[ 0.99121108  0.35582816]  [ 0.90154837  0.86254049]  [ 0.83149103  0.42222948]] In [7]:  print data[~select] [[ 0.93825827  0.26701143]  [ 0.27309625  0.38925281]  [ 0.06510739  0.58445673]  [ 0.61469637  0.05420098]  [ 0.92685408  0.62715114]  [ 0.22587817  0.56819403]  [ 0.28400409  0.21112043]] 
like image 45
CT Zhu Avatar answered Sep 20 '22 04:09

CT Zhu