Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Replacing values in an array

I have a 1 dimensional data set with some no data values which are set as 9999. Here is an extract as it is quite long:

this_array = [   4,    4,    1, 9999, 9999, 9999,   -5,   -4, ... ]

I would like to replace the no data values with the average of the closest values on either side, however as some no data values have closest values as no data values as well, replacing them is a little harder. i.e. I would like the three no data values to be replaced with -2. I have created a loop to go through each of the scalars in the array and test for no data:

for k in this_array:
    if k == 9999:
        temp = np.where(k == 9999, (abs(this_array[k-1]-this_array[k+1])/2), this_array[k])
    else:
        pass
this_array[k] = temp

However I need to add in an if function or way to take the value before k-1 or after k+1 if that also is equal to 9999 e.g:

if np.logical_or(k+1 == 9999, k-1 == 9999):
    temp = np.where(k == 9999, (abs(this_array[k-2]-this_array[k+2])/2), this_array[k])

As one can tell, this code gets messy as one may end up taking the wrong value or ending up with loads of nested if functions. Does anyone know of a cleaner way to implement this as it's pretty variable throughout the dataset?

As requested: If the first and/or last points are no data, they would preferably be replaced with the closest data point.

like image 908
AJEnvMap Avatar asked Dec 18 '12 21:12

AJEnvMap


2 Answers

There may be a more efficeint way to do this with numpy functions, but here is a solution using the itertools module:

from itertools import groupby

for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999):
    if k:
        indices = list(g)
        new_v = (this_array[indices[0]-1] + this_array[indices[-1]+1]) / 2
        this_array[indices[0]:indices[-1]+1].fill(new_v)

If the last element or first element can be 9999, you use the following:

from itertools import groupby

for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999):
    if k:
        indices = list(g)
        prev_i, next_i = indices[0]-1, indices[-1]+1
        before = this_array[prev_i] if prev_i != -1 else this_array[next_i]
        after = this_array[next_i] if next_i != len(this_array) else before
        this_array[indices[0]:next_i].fill((before + after) / 2)

Example using second version:

>>> from itertools import groupby
>>> this_array = np.array([9999, 4, 1, 9999, 9999, 9999, -5, -4, 9999])
>>> for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999):
...     if k:
...         indices = list(g)
...         prev_i, next_i = indices[0]-1, indices[-1]+1
...         before = this_array[prev_i] if prev_i != -1 else this_array[next_i]
...         after = this_array[next_i] if next_i != len(this_array) else before
...         this_array[indices[0]:next_i].fill((before + after) / 2)
...
>>> this_array
array([ 4,  4,  1, -2, -2, -2, -5, -4, -4])
like image 200
Andrew Clark Avatar answered Oct 23 '22 10:10

Andrew Clark


I'd do something along the following lines:

import numpy as np

def fill(arr, fwd_fill):
  out = arr.copy()
  if fwd_fill:
    start, end, step = 0, len(out), 1
  else:
    start, end, step = len(out)-1, -1, -1
  cur = out[start]
  for i in range(start, end, step):
    if np.isnan(out[i]):
      out[i] = cur
    else:
      cur = out[i]
  return out

def avg(arr):
  fwd = fill(arr, True)
  back = fill(arr, False)
  return (fwd[:-2] + back[2:]) / 2.

arr = np.array([   4,    4,    1, np.nan, np.nan, np.nan,   -5,   -4])
print arr
print avg(arr)

The first function can do either a forward or a backward fill, replacing every NaN with the nearest non-NaN.

Once you have that, computing the average is trivial, and is done by the second function.

You don't say how you want the first and the last element handled, so the code just chops them off.

Finally, it is worth noting that the function can return NaNs if either the first or the last element of the input array are missing (in which case there's no data to compute some of the averages).

like image 22
NPE Avatar answered Oct 23 '22 10:10

NPE