How can I ignore zeros when I take the median on columns of an array?

Tags:

I have a simple numpy array.

array([[10,   0,  10,  0],
       [ 1,   1,   0,  0]
       [ 9,   9,   9,  0]
       [ 0,  10,   1,  0]])

I would like to take the median of each column, individually, of this array.

However, there are a few 0 values in various places which I would like to ignore in the calculation of the medians.

To further complicate, I would like to keep the columns with only 0 entries as having the median of 0. In this manner, those columns would serve as a bit of a place holder, keeping the dimensions of the matrix the same.

The numpy documentation doesn't have any argument that would work for what I want (maybe I am spoiled by the many switches we get with R!)

numpy.median(a, axis=None, out=None, overwrite_input=False)[source]

Can someone please shed some light on an effective way to do this, which is in line with the spirit of numpy? I could hack it out but in that case I feel like I've defeated the purpose of using numpy in the first place.

Thanks in advance.

522

asked Feb 26 '14 17:02

tumultous_rooster

3 Answers

Masked array is always handy, but slooooooow:

In [14]:

%timeit np.ma.median(y, axis=0).filled(0)
1000 loops, best of 3: 1.73 ms per loop
In [15]:

%%timeit
ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)
ans[np.isnan(ans)]=0.
1000 loops, best of 3: 402 µs per loop

In [16]:

ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)
ans[np.isnan(ans)]=0.; ans
Out[16]:
array([ 9.,  9.,  9.,  0.])

np.nonzero is even faster:

In [25]:

%%timeit
ans=np.apply_along_axis(lambda v: np.median(v[np.nonzero(v)]), 0, x)
ans[np.isnan(ans)]=0.
1000 loops, best of 3: 384 µs per loop

answered Sep 22 '22 23:09

CT Zhu

Use masked arrays and np.ma.median(axis=0).filled(0) to get the medians of the columns.

In [1]: x = np.array([[10, 0, 10, 0], [1, 1, 0, 0], [9, 9, 9, 0], [0, 10, 1, 0]])
In [2]: y = np.ma.masked_where(x == 0, x)
In [3]: x
Out[3]: 
array([[10,  0, 10, 0],
       [ 1,  1,  0, 0],
       [ 9,  9,  9, 0],
       [ 0, 10,  1, 0]])
In [4]: y
Out[4]: 
masked_array(data =
 [[10 -- 10 --]
 [1 1 -- --]
 [9 9 9 --]
 [-- 10 1 --]],
             mask =
 [[False  True False True]
 [False False  True True]
 [False False False True]
 [ True False False True]],
       fill_value = 999999)
In [6]: np.median(x, axis=0)
Out[6]: array([ 5.,  5.,  5., 0.])
In [7]: np.ma.median(y, axis=0).filled(0)
Out[7]: 
array(data = [ 9.  9.  9., 0.])

answered Sep 19 '22 23:09

wflynny

You can use masked arrays.

a = np.array([[10, 0, 10, 0], [1, 1, 0, 0],[9,9,9,0],[0,10,1,0]])
m = np.ma.masked_equal(a, 0)

In [44]: np.median(a)
Out[44]: 1.0

In [45]: np.ma.median(m)
Out[45]: 9.0

In [46]: m
Out[46]:
masked_array(data =
 [[10 -- 10 --]
 [1 1 -- --]
 [9 9 9 --]
 [-- 10 1 --]],
             mask =
 [[False  True False  True]
 [False False  True  True]
 [False False False  True]
 [ True False False  True]],
       fill_value = 0)

answered Sep 19 '22 23:09

M4rtini

Related questions
                            
                                Python Tkinter - How to insert text at the beginning of the text box?
                            
                                Adding field that isn't in model to serializer in Django REST framework
                            
                                VALUES clause in SQLAlchemy
                            
                                Get status text after failed http-request
                            
                                change request.GET QueryDict values
                            
                                converting binary to utf-8 in python
                            
                                How to store os.system() output in a variable or a list in python [duplicate]
                            
                                Efficient Vector / Point class in Python
                            
                                Removing specific ticks from matplotlib plot
                            
                                Can't install discount with pip: error: command 'cc' failed with exit status 1
                            
                                Configuring Django
                            
                                Flask app gives ubiquitous 404 when proxied through nginx
                            
                                Pandas: Impute NaN's
                            
                                Export Django Database into YAML file
                            
                                Python string with space and without space at the end and immutability
                            
                                Fast ping sweep in python
                            
                                Adding lines after specific line
                            
                                Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError
                            
                                How to convert tuple to a multi nested dictionary in python?
                            
                                Python - Remove list(s) from list of lists (Similar functionality to .pop() )

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I ignore zeros when I take the median on columns of an array?

Tags:

python

arrays

numpy

zero

median