I have a large set of data in which I need to compare the distances of a set of samples from this array with all the other elements of the array. Below is a very simple example of my data set.
import numpy as np import scipy.spatial.distance as sd data = np.array( [[ 0.93825827, 0.26701143], [ 0.99121108, 0.35582816], [ 0.90154837, 0.86254049], [ 0.83149103, 0.42222948], [ 0.27309625, 0.38925281], [ 0.06510739, 0.58445673], [ 0.61469637, 0.05420098], [ 0.92685408, 0.62715114], [ 0.22587817, 0.56819403], [ 0.28400409, 0.21112043]] ) sample_indexes = [1,2,3] # I'd rather not make this other_indexes = list(set(range(len(data))) - set(sample_indexes)) sample_data = data[sample_indexes] other_data = data[other_indexes] # compare them dists = sd.cdist(sample_data, other_data)
Is there a way to index a numpy array for indexes that are NOT the sample indexes? In my above example I make a list called other_indexes. I'd rather not have to do this for various reasons (large data set, threading, a very VERY low amount of memory on the system this is running on etc. etc. etc.). Is there a way to do something like..
other_data = data[ indexes not in sample_indexes]
I read that numpy masks can do this but I tried...
other_data = data[~sample_indexes]
And this gives me an error. Do I have to create a mask?
To select an element from Numpy Array , we can use [] operator i.e. It will return the element at given index only.
To remove an element from a NumPy array: Specify the index of the element to remove. Call the numpy. delete() function on the array for the given index.
Negative indices are interpreted as counting from the end of the array (i.e., if i < 0, it means n_i + i). All arrays generated by basic slicing are always views of the original array. The standard rules of sequence slicing apply to basic slicing on a per-dimension basis (including using a step index).
mask = np.ones(len(data), np.bool) mask[sample_indexes] = 0 other_data = data[mask]
not the most elegant for what perhaps should be a single-line statement, but its fairly efficient, and the memory overhead is minimal too.
If memory is your prime concern, np.delete would avoid the creation of the mask, and fancy-indexing creates a copy anyway.
On second thought; np.delete does not modify the existing array, so its pretty much exactly the single line statement you are looking for.
You may want to try in1d
In [5]: select = np.in1d(range(data.shape[0]), sample_indexes) In [6]: print data[select] [[ 0.99121108 0.35582816] [ 0.90154837 0.86254049] [ 0.83149103 0.42222948]] In [7]: print data[~select] [[ 0.93825827 0.26701143] [ 0.27309625 0.38925281] [ 0.06510739 0.58445673] [ 0.61469637 0.05420098] [ 0.92685408 0.62715114] [ 0.22587817 0.56819403] [ 0.28400409 0.21112043]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With