I have a 2d array(or matrix if you prefer) with some missing values represented as <code>NaN</code>. The missing values are typically in a strip along one axis, eg: <pre class="prettyprint"><code>1 2 3 NaN 5 2 3 4 Nan 6 3 4 Nan Nan 7 4 5 Nan Nan 8 5 6 7 8 9 </code></pre> where I would like to replace the <code>NaN</code>'s by somewhat sensible numbers. I looked into delaunay triangulation, but found very little documentation. I tried using <code>astropy</code>'s convolve as it supports use of 2d arrays, and is quite straightforward. The problem with this is that convolution is not interpolation, it moves all values towards the average (which could be mitigated by using a narrow kernel). This question should be the natural 2-dimensional extension to this post. Is there a way to interpolate over <code>NaN</code>/missing values in a 2d-array?

Yes you can use <code>scipy.interpolate.griddata</code> and masked array and you can choose the type of interpolation that you prefer using the argument <code>method</code> usually <code>'cubic'</code> do an excellent job: <pre class="prettyprint"><code>import numpy as np from scipy import interpolate #Let's create some random data array = np.random.random_integers(0,10,(10,10)).astype(float) #values grater then 7 goes to np.nan array[array>7] = np.nan </code></pre> That looks something like this using <code>plt.imshow(array,interpolation='nearest')</code> : <img src="https://i.stack.imgur.com/yrxg8.png" alt="enter image description here"> <pre class="prettyprint"><code>x = np.arange(0, array.shape[1]) y = np.arange(0, array.shape[0]) #mask invalid values array = np.ma.masked_invalid(array) xx, yy = np.meshgrid(x, y) #get only the valid values x1 = xx[~array.mask] y1 = yy[~array.mask] newarr = array[~array.mask] GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method='cubic') </code></pre> This is the final result: <img src="https://i.stack.imgur.com/Zmti2.png" alt="enter image description here"> Look that if the nan values are in the edges and are surrounded by nan values thay can't be interpolated and are kept <code>nan</code>. You can change it using the <code>fill_value</code> argument. <h3>How would this work if there is a 3x3 region of NaN-values, would you get sensible data for the middle point?</h3> It depends on your kind of data, you have to perform some test. You could for instance mask on purpose some good data try different kind of interpolation e.g. cubic, linear etc. etc. with the array with the masked values and calculuate the difference between the values interpolated and the original values that you had masked before and see which method return you the minor difference. You can use something like this: <pre class="prettyprint"><code>reference = array[3:6,3:6].copy() array[3:6,3:6] = np.nan method = ['linear', 'nearest', 'cubic'] for i in method: GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method=i) meandifference = np.mean(np.abs(reference - GD1[3:6,3:6])) print ' %s interpolation difference: %s' %(i,meandifference ) </code></pre> That gives something like this: <pre class="prettyprint"><code> linear interpolation difference: 4.88888888889 nearest interpolation difference: 4.11111111111 cubic interpolation difference: 5.99400137377 </code></pre> Of course this is for random numbers so it's normal that the result may vary a lot. So the best thing to do is to test on "on purpose masked" piece of your dataset and see what happen.

interpolate missing values 2d python

Tags:

python

numpy

interpolation

I have a 2d array(or matrix if you prefer) with some missing values represented as NaN. The missing values are typically in a strip along one axis, eg:

1   2   3 NaN   5
2   3   4 Nan   6
3   4 Nan Nan   7
4   5 Nan Nan   8
5   6   7   8   9

where I would like to replace the NaN's by somewhat sensible numbers.

I looked into delaunay triangulation, but found very little documentation.

I tried using astropy's convolve as it supports use of 2d arrays, and is quite straightforward. The problem with this is that convolution is not interpolation, it moves all values towards the average (which could be mitigated by using a narrow kernel).

This question should be the natural 2-dimensional extension to this post. Is there a way to interpolate over NaN/missing values in a 2d-array?

790

asked Jun 06 '16 16:06

M.T

1 Answers

Yes you can use scipy.interpolate.griddata and masked array and you can choose the type of interpolation that you prefer using the argument method usually 'cubic' do an excellent job:

import numpy as np
from scipy import interpolate


#Let's create some random  data
array = np.random.random_integers(0,10,(10,10)).astype(float)
#values grater then 7 goes to np.nan
array[array>7] = np.nan

That looks something like this using plt.imshow(array,interpolation='nearest') :

enter image description here

x = np.arange(0, array.shape[1])
y = np.arange(0, array.shape[0])
#mask invalid values
array = np.ma.masked_invalid(array)
xx, yy = np.meshgrid(x, y)
#get only the valid values
x1 = xx[~array.mask]
y1 = yy[~array.mask]
newarr = array[~array.mask]

GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                          (xx, yy),
                             method='cubic')

This is the final result:

enter image description here

Look that if the nan values are in the edges and are surrounded by nan values thay can't be interpolated and are kept nan. You can change it using the fill_value argument.

How would this work if there is a 3x3 region of NaN-values, would you get sensible data for the middle point?

It depends on your kind of data, you have to perform some test. You could for instance mask on purpose some good data try different kind of interpolation e.g. cubic, linear etc. etc. with the array with the masked values and calculuate the difference between the values interpolated and the original values that you had masked before and see which method return you the minor difference.

You can use something like this:

reference = array[3:6,3:6].copy()
array[3:6,3:6] = np.nan
method = ['linear', 'nearest', 'cubic']

for i in method:
    GD1 = interpolate.griddata((x1, y1), newarr.ravel(),
                              (xx, yy),
                                 method=i)
    meandifference = np.mean(np.abs(reference - GD1[3:6,3:6]))
    print ' %s interpolation difference: %s' %(i,meandifference )

That gives something like this:

   linear interpolation difference: 4.88888888889
   nearest interpolation difference: 4.11111111111
   cubic interpolation difference: 5.99400137377

Of course this is for random numbers so it's normal that the result may vary a lot. So the best thing to do is to test on "on purpose masked" piece of your dataset and see what happen.

106

answered Oct 14 '22 08:10

G M

Related questions
                            
                                Correlation coefficients for sparse matrix in python?
                            
                                python requests on Google App Engine not working for HTTPS
                            
                                Flask unit testing: Getting the response's redirect location
                            
                                Accessing argument values for argparse in Python
                            
                                Why is super used so much in PySide/PyQt?
                            
                                What are __signature__ and __text_signature__ used for in Python 3.4
                            
                                Writing hex data into a file
                            
                                Python imports relative path
                            
                                How can I display an image using Pillow?
                            
                                Python 3 exception deletes variable in enclosing scope for unknown reason [duplicate]
                            
                                How to create ternary contour plot in Python?
                            
                                How can I keep test data after Django tests complete?
                            
                                Memory efficient sort of massive numpy array in Python
                            
                                What is the difference between skew and kurtosis functions in pandas vs. scipy?
                            
                                ValueError: setting an array element with a sequence. for Pandas
                            
                                Reorder levels of MultiIndex in a pandas DataFrame
                            
                                How to replace all values in a Pandas Dataframe not in a list? [duplicate]
                            
                                Using Boto3 to interact with amazon Aurora on RDS
                            
                                Average of a numpy array returns NaN
                            
                                overcome Graphdef cannot be larger than 2GB in tensorflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With