I have a 1 dimensional data set with some no data values which are set as 9999. Here is an extract as it is quite long:
this_array = [ 4, 4, 1, 9999, 9999, 9999, -5, -4, ... ]
I would like to replace the no data values with the average of the closest values on either side, however as some no data values have closest values as no data values as well, replacing them is a little harder. i.e. I would like the three no data values to be replaced with -2. I have created a loop to go through each of the scalars in the array and test for no data:
for k in this_array:
if k == 9999:
temp = np.where(k == 9999, (abs(this_array[k-1]-this_array[k+1])/2), this_array[k])
else:
pass
this_array[k] = temp
However I need to add in an if function or way to take the value before k-1 or after k+1 if that also is equal to 9999 e.g:
if np.logical_or(k+1 == 9999, k-1 == 9999):
temp = np.where(k == 9999, (abs(this_array[k-2]-this_array[k+2])/2), this_array[k])
As one can tell, this code gets messy as one may end up taking the wrong value or ending up with loads of nested if functions. Does anyone know of a cleaner way to implement this as it's pretty variable throughout the dataset?
As requested: If the first and/or last points are no data, they would preferably be replaced with the closest data point.
There may be a more efficeint way to do this with numpy functions, but here is a solution using the itertools module:
from itertools import groupby
for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999):
if k:
indices = list(g)
new_v = (this_array[indices[0]-1] + this_array[indices[-1]+1]) / 2
this_array[indices[0]:indices[-1]+1].fill(new_v)
If the last element or first element can be 9999
, you use the following:
from itertools import groupby
for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999):
if k:
indices = list(g)
prev_i, next_i = indices[0]-1, indices[-1]+1
before = this_array[prev_i] if prev_i != -1 else this_array[next_i]
after = this_array[next_i] if next_i != len(this_array) else before
this_array[indices[0]:next_i].fill((before + after) / 2)
Example using second version:
>>> from itertools import groupby
>>> this_array = np.array([9999, 4, 1, 9999, 9999, 9999, -5, -4, 9999])
>>> for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999):
... if k:
... indices = list(g)
... prev_i, next_i = indices[0]-1, indices[-1]+1
... before = this_array[prev_i] if prev_i != -1 else this_array[next_i]
... after = this_array[next_i] if next_i != len(this_array) else before
... this_array[indices[0]:next_i].fill((before + after) / 2)
...
>>> this_array
array([ 4, 4, 1, -2, -2, -2, -5, -4, -4])
I'd do something along the following lines:
import numpy as np
def fill(arr, fwd_fill):
out = arr.copy()
if fwd_fill:
start, end, step = 0, len(out), 1
else:
start, end, step = len(out)-1, -1, -1
cur = out[start]
for i in range(start, end, step):
if np.isnan(out[i]):
out[i] = cur
else:
cur = out[i]
return out
def avg(arr):
fwd = fill(arr, True)
back = fill(arr, False)
return (fwd[:-2] + back[2:]) / 2.
arr = np.array([ 4, 4, 1, np.nan, np.nan, np.nan, -5, -4])
print arr
print avg(arr)
The first function can do either a forward or a backward fill, replacing every NaN with the nearest non-NaN.
Once you have that, computing the average is trivial, and is done by the second function.
You don't say how you want the first and the last element handled, so the code just chops them off.
Finally, it is worth noting that the function can return NaNs if either the first or the last element of the input array are missing (in which case there's no data to compute some of the averages).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With