Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I run a numpy function percentile() on a masked array?

I try to retrieve percentiles from an array with NoData values. In my case the Nodata values are represented by -3.40282347e+38. I thought a masked array would exclude this values from further calculations. I succesfully create the masked array but for the np.percentile() function the mask has no effect.

>>> DataArray = np.array(data)
>>> DataArray

([[ value, value...]], dtype=float32)

>>> masked_data = ma.masked_where(DataArray < 0, DataArray)
>>> p5 = np.percentile(masked_data, 5)
>>> print p5

 -3.40282347e+38
like image 666
EikeMike Avatar asked Jun 21 '16 05:06

EikeMike


2 Answers

If you fill your masked values as np.nan, you could then use np.nanpercentile

import numpy as np
data = np.arange(-5.5,10.5) # Note that you need a non-integer array to store NaN
mdata = np.ma.masked_where(data < 0, data)
mdata = np.ma.filled(mdata, np.nan)
np.nanpercentile(mdata, 50) # 50th percentile
like image 114
alphabetasoup Avatar answered Sep 26 '22 21:09

alphabetasoup


Looking at the np.percentile code it is clear it does nothing special with masked arrays.

def percentile(a, q, axis=None, out=None,
               overwrite_input=False, interpolation='linear', keepdims=False):
    q = array(q, dtype=np.float64, copy=True)
    r, k = _ureduce(a, func=_percentile, q=q, axis=axis, out=out,
                    overwrite_input=overwrite_input,
                    interpolation=interpolation)
    if keepdims:
        if q.ndim == 0:
            return r.reshape(k)
        else:
            return r.reshape([len(q)] + k)
    else:
        return r

Where _ureduce and _percentile are internal functions defined in numpy/lib/function_base.py. So the real action is more complex.

Masked arrays have 2 strategies for using numpy functions. One is to fill - replace the masked values with innocuous ones, for example 0 when doing sum, 1 when doing a product. The other is to compress the data - that is, remove all masked values.

for example:

In [997]: data=np.arange(-5,10)
In [998]: mdata=np.ma.masked_where(data<0,data)

In [1001]: np.ma.filled(mdata,0)
Out[1001]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [1002]: np.ma.filled(mdata,1)
Out[1002]: array([1, 1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [1008]: mdata.compressed()
Out[1008]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Which is going to give you the desired percentile? Filling or compressing? Or none. You need to understand the concept of percentile well enough to know how it should apply in the case of your masked values.

like image 29
hpaulj Avatar answered Sep 24 '22 21:09

hpaulj