Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: sorting an array with NaNs

Note: I'm using Python and numpy arrays.

I have many arrays which all have two columns and many rows. There are some NaN values in the second column; the first column only has numbers.

I would like to sort each array in increasing order according to the second column, leaving the NaN values out. It's a big dataset so I would rather not have to convert the NaN values into zeros or something.

I'd like it to sort like so:

105.  4.
22.   10.
104.  26.
...
...
...
53.   520.
745.  902.
184.  nan
19.   nan

First I tried using fix_invalid which converts the NaNs into 1x10^20:

#data.txt has one of the arrays with 2 columns and a bunch of rows.
Data_0_30 = array(genfromtxt(fname='data.txt'))

g = open("iblah.txt", "a") #saves to file

def Sorted_i_M_W(mass):
    masked = ma.fix_invalid(mass)
    print  >> g, array(sorted(masked, key=itemgetter(1)))

Sorted_i_M_W(Data_0_30)

g.close()

Or I replaced the function with something like this:

def Sorted_i_M_W(mass):
    sortedmass = sorted( mass, key=itemgetter(1))
    print  >> g, array(sortedmass)

For each attempt I got something like:

...
[  4.46800000e+03   1.61472200e+11]
[  3.72700000e+03   1.74166300e+11]
[  4.91800000e+03   1.75502300e+11]
[  6.43500000e+03              nan]
[  3.95520000e+04   8.38907500e+09]
[  3.63750000e+04   1.27625700e+10]
[  2.08810000e+04   1.28578500e+10]
...

Where at the location of the NaN value, the sorting re-starts again.

(For the fix_invalid the NaN in the above excerpt shows a 1.00000000e+20 value). But I'd like the sorting to ignore the NaN value completely.

What's the easiest way to sort this array the way I want?

like image 808
user3207120 Avatar asked Jan 17 '14 15:01

user3207120


2 Answers

Not sure if it can be done with numpy.sort, but you can use numpy.argsort for sure:

>>> arr
array([[ 105.,    4.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan],
       [  22.,   10.],
       [ 104.,   26.]])
>>> arr[np.argsort(arr[:,1])]
array([[ 105.,    4.],
       [  22.,   10.],
       [ 104.,   26.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan]])
like image 50
alko Avatar answered Oct 01 '22 18:10

alko


You can create a masked array:

a = np.loadtxt('test.txt')

mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)

And then sort a using the masked array:

a[np.argsort(ma[:, 1])]
like image 22
Saullo G. P. Castro Avatar answered Oct 01 '22 19:10

Saullo G. P. Castro