Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NaN's in NumPy array with closest non-NaN value

I have a NumPy array a like the following:

>>> str(a)
'[        nan         nan         nan  1.44955726  1.44628034  1.44409573\n  1.4408188   1.43657094  1.43171624  1.42649744  1.42200684  1.42117704\n  1.42040255  1.41922908         nan         nan         nan         nan\n         nan         nan]'

I want to replace each NaN with the closest non-NaN value, so that all of the NaN's at the beginning get set to 1.449... and all of the NaN's at the end get set to 1.419....

I can see how to do this for specific cases like this, but I need to be able to do it generally for any length of array, with any length of NaN's at the beginning and end of the array (there will be no NaN's in the middle of the numbers). Any ideas?

I can find the NaN's easily enough with np.isnan(), but I can't work out how to get the closest value to each NaN.

like image 804
robintw Avatar asked Mar 02 '12 17:03

robintw


People also ask

How can I replace NaN in numpy array?

In NumPy, to replace missing values NaN ( np. nan ) in ndarray with other numbers, use np. nan_to_num() or np. isnan() .


3 Answers

As an alternate solution (this will linearly interpolate for arrays NaNs in the middle, as well):

import numpy as np  # Generate data... data = np.random.random(10) data[:2] = np.nan data[-1] = np.nan data[4:6] = np.nan  print data  # Fill in NaN's... mask = np.isnan(data) data[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), data[~mask])  print data 

This yields:

[        nan         nan  0.31619306  0.25818765         nan         nan   0.27410025  0.23347532  0.02418698         nan]  [ 0.31619306  0.31619306  0.31619306  0.25818765  0.26349185  0.26879605   0.27410025  0.23347532  0.02418698  0.02418698] 
like image 137
Joe Kington Avatar answered Sep 28 '22 05:09

Joe Kington


I want to replace each NaN with the closest non-NaN value... there will be no NaN's in the middle of the numbers

The following will do it:

ind = np.where(~np.isnan(a))[0]
first, last = ind[0], ind[-1]
a[:first] = a[first]
a[last + 1:] = a[last]

This is a straight numpy solution requiring no Python loops, no recursion, no list comprehensions etc.

like image 37
NPE Avatar answered Sep 28 '22 03:09

NPE


NaNs have the interesting property of comparing different from themselves, thus we can quickly find the indexes of the non-nan elements:

idx = np.nonzero(a==a)[0]

it's now easy to replace the nans with the desired value:

for i in range(0, idx[0]):
    a[i]=a[idx[0]]
for i in range(idx[-1]+1, a.size)
    a[i]=a[idx[-1]]

Finally, we can put this in a function:

import numpy as np

def FixNaNs(arr):
    if len(arr.shape)>1:
        raise Exception("Only 1D arrays are supported.")
    idxs=np.nonzero(arr==arr)[0]

    if len(idxs)==0:
        return None

    ret=arr

    for i in range(0, idxs[0]):
        ret[i]=ret[idxs[0]]

    for i in range(idxs[-1]+1, ret.size):
        ret[i]=ret[idxs[-1]]

    return ret

edit

Ouch, coming from C++ I always forget about list ranges... @aix's solution is way more elegant and efficient than my C++ish loops, use that instead of mine.

like image 27
Matteo Italia Avatar answered Sep 28 '22 05:09

Matteo Italia