Note: I'm using Python and numpy arrays.
I have many arrays which all have two columns and many rows. There are some NaN values in the second column; the first column only has numbers.
I would like to sort each array in increasing order according to the second column, leaving the NaN values out. It's a big dataset so I would rather not have to convert the NaN values into zeros or something.
I'd like it to sort like so:
105.  4.
22.   10.
104.  26.
...
...
...
53.   520.
745.  902.
184.  nan
19.   nan
First I tried using fix_invalid which converts the NaNs into 1x10^20:
#data.txt has one of the arrays with 2 columns and a bunch of rows.
Data_0_30 = array(genfromtxt(fname='data.txt'))
g = open("iblah.txt", "a") #saves to file
def Sorted_i_M_W(mass):
    masked = ma.fix_invalid(mass)
    print  >> g, array(sorted(masked, key=itemgetter(1)))
Sorted_i_M_W(Data_0_30)
g.close()
Or I replaced the function with something like this:
def Sorted_i_M_W(mass):
    sortedmass = sorted( mass, key=itemgetter(1))
    print  >> g, array(sortedmass)
For each attempt I got something like:
...
[  4.46800000e+03   1.61472200e+11]
[  3.72700000e+03   1.74166300e+11]
[  4.91800000e+03   1.75502300e+11]
[  6.43500000e+03              nan]
[  3.95520000e+04   8.38907500e+09]
[  3.63750000e+04   1.27625700e+10]
[  2.08810000e+04   1.28578500e+10]
...
Where at the location of the NaN value, the sorting re-starts again.
(For the fix_invalid the NaN in the above excerpt shows a 1.00000000e+20 value). But I'd like the sorting to ignore the NaN value completely.
What's the easiest way to sort this array the way I want?
Not sure if it can be done with numpy.sort, but you can use numpy.argsort for sure:
>>> arr
array([[ 105.,    4.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan],
       [  22.,   10.],
       [ 104.,   26.]])
>>> arr[np.argsort(arr[:,1])]
array([[ 105.,    4.],
       [  22.,   10.],
       [ 104.,   26.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan]])
                        You can create a masked array:
a = np.loadtxt('test.txt')
mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)
And then sort a using the masked array:
a[np.argsort(ma[:, 1])]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With