Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I ignore zeros when I take the median on columns of an array?

I have a simple numpy array.

array([[10,   0,  10,  0],
       [ 1,   1,   0,  0]
       [ 9,   9,   9,  0]
       [ 0,  10,   1,  0]])

I would like to take the median of each column, individually, of this array.

However, there are a few 0 values in various places which I would like to ignore in the calculation of the medians.

To further complicate, I would like to keep the columns with only 0 entries as having the median of 0. In this manner, those columns would serve as a bit of a place holder, keeping the dimensions of the matrix the same.

The numpy documentation doesn't have any argument that would work for what I want (maybe I am spoiled by the many switches we get with R!)

numpy.median(a, axis=None, out=None, overwrite_input=False)[source]

Can someone please shed some light on an effective way to do this, which is in line with the spirit of numpy? I could hack it out but in that case I feel like I've defeated the purpose of using numpy in the first place.

Thanks in advance.

like image 522
tumultous_rooster Avatar asked Feb 26 '14 17:02

tumultous_rooster


People also ask

Does median ignore 0?

In Excel, when we apply the common formula =MEDIAN(range), it will calculate the median value within a range including zeros, and it will also get error result when is applied in a range which includes error values as below screenshot show.

Do you count zeros when finding the median?

4 facts you should know about Excel Median When the total number of values is even, it returns an average of the two middle numbers. Cells with zero values (0) are included in calculations. Empty cells as well as cells containing text and logical values are ignored.

Why is my median zero?

A variable that is always zero will necessarily have mean and median zero. A variable that can in principle be only zero or positive can only have mean zero if all values in practice are zero. On the other hand, such a variable can and will have median zero if more than half of the values are zero.


3 Answers

Masked array is always handy, but slooooooow:

In [14]:

%timeit np.ma.median(y, axis=0).filled(0)
1000 loops, best of 3: 1.73 ms per loop
In [15]:

%%timeit
ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)
ans[np.isnan(ans)]=0.
1000 loops, best of 3: 402 µs per loop

In [16]:

ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)
ans[np.isnan(ans)]=0.; ans
Out[16]:
array([ 9.,  9.,  9.,  0.])

np.nonzero is even faster:

In [25]:

%%timeit
ans=np.apply_along_axis(lambda v: np.median(v[np.nonzero(v)]), 0, x)
ans[np.isnan(ans)]=0.
1000 loops, best of 3: 384 µs per loop
like image 74
CT Zhu Avatar answered Sep 22 '22 23:09

CT Zhu


Use masked arrays and np.ma.median(axis=0).filled(0) to get the medians of the columns.

In [1]: x = np.array([[10, 0, 10, 0], [1, 1, 0, 0], [9, 9, 9, 0], [0, 10, 1, 0]])
In [2]: y = np.ma.masked_where(x == 0, x)
In [3]: x
Out[3]: 
array([[10,  0, 10, 0],
       [ 1,  1,  0, 0],
       [ 9,  9,  9, 0],
       [ 0, 10,  1, 0]])
In [4]: y
Out[4]: 
masked_array(data =
 [[10 -- 10 --]
 [1 1 -- --]
 [9 9 9 --]
 [-- 10 1 --]],
             mask =
 [[False  True False True]
 [False False  True True]
 [False False False True]
 [ True False False True]],
       fill_value = 999999)
In [6]: np.median(x, axis=0)
Out[6]: array([ 5.,  5.,  5., 0.])
In [7]: np.ma.median(y, axis=0).filled(0)
Out[7]: 
array(data = [ 9.  9.  9., 0.])
like image 30
wflynny Avatar answered Sep 19 '22 23:09

wflynny


You can use masked arrays.

a = np.array([[10, 0, 10, 0], [1, 1, 0, 0],[9,9,9,0],[0,10,1,0]])
m = np.ma.masked_equal(a, 0)

In [44]: np.median(a)
Out[44]: 1.0

In [45]: np.ma.median(m)
Out[45]: 9.0

In [46]: m
Out[46]:
masked_array(data =
 [[10 -- 10 --]
 [1 1 -- --]
 [9 9 9 --]
 [-- 10 1 --]],
             mask =
 [[False  True False  True]
 [False False  True  True]
 [False False False  True]
 [ True False False  True]],
       fill_value = 0)
like image 28
M4rtini Avatar answered Sep 19 '22 23:09

M4rtini