How can I calculate matrix mean values along a matrix, but to remove <code>nan</code> values from calculation? (For R people, think <code>na.rm = TRUE</code>). Here is my [non-]working example: <pre class="prettyprint"><code>import numpy as np dat = np.array([[1, 2, 3], [4, 5, np.nan], [np.nan, 6, np.nan], [np.nan, np.nan, np.nan]]) print(dat) print(dat.mean(1)) # [ 2. nan nan nan] </code></pre> With NaNs removed, my expected output would be: <pre class="prettyprint"><code>array([ 2., 4.5, 6., nan]) </code></pre>

If performance matters, you should use <code>bottleneck.nanmean()</code> instead: http://pypi.python.org/pypi/Bottleneck

NumPy: calculate averages with NaNs removed

Tags:

python

nan

numpy

How can I calculate matrix mean values along a matrix, but to remove nan values from calculation? (For R people, think na.rm = TRUE).

Here is my [non-]working example:

import numpy as np dat = np.array([[1, 2, 3],                 [4, 5, np.nan],                 [np.nan, 6, np.nan],                 [np.nan, np.nan, np.nan]]) print(dat) print(dat.mean(1))  # [  2.  nan  nan  nan]

With NaNs removed, my expected output would be:

array([ 2.,  4.5,  6.,  nan])

220

asked Mar 30 '11 00:03

Mike T

2 Answers

I think what you want is a masked array:

dat = np.array([[1,2,3], [4,5,nan], [nan,6,nan], [nan,nan,nan]]) mdat = np.ma.masked_array(dat,np.isnan(dat)) mm = np.mean(mdat,axis=1) print mm.filled(np.nan) # the desired answer

Edit: Combining all of the timing data

   from timeit import Timer      setupstr=""" import numpy as np from scipy.stats.stats import nanmean     dat = np.random.normal(size=(1000,1000)) ii = np.ix_(np.random.randint(0,99,size=50),np.random.randint(0,99,size=50)) dat[ii] = np.nan """        method1=""" mdat = np.ma.masked_array(dat,np.isnan(dat)) mm = np.mean(mdat,axis=1) mm.filled(np.nan)     """      N = 2     t1 = Timer(method1, setupstr).timeit(N)     t2 = Timer("[np.mean([l for l in d if not np.isnan(l)]) for d in dat]", setupstr).timeit(N)     t3 = Timer("np.array([r[np.isfinite(r)].mean() for r in dat])", setupstr).timeit(N)     t4 = Timer("np.ma.masked_invalid(dat).mean(axis=1)", setupstr).timeit(N)     t5 = Timer("nanmean(dat,axis=1)", setupstr).timeit(N)      print 'Time: %f\tRatio: %f' % (t1,t1/t1 )     print 'Time: %f\tRatio: %f' % (t2,t2/t1 )     print 'Time: %f\tRatio: %f' % (t3,t3/t1 )     print 'Time: %f\tRatio: %f' % (t4,t4/t1 )     print 'Time: %f\tRatio: %f' % (t5,t5/t1 )

Returns:

Time: 0.045454  Ratio: 1.000000 Time: 8.179479  Ratio: 179.950595 Time: 0.060988  Ratio: 1.341755 Time: 0.070955  Ratio: 1.561029 Time: 0.065152  Ratio: 1.433364

112

answered Oct 06 '22 02:10

JoshAdel

If performance matters, you should use bottleneck.nanmean() instead:

http://pypi.python.org/pypi/Bottleneck

answered Oct 06 '22 00:10

deprecated

Related questions
                            
                                Python logging in Django
                            
                                Remove and ignore all files that have an extension from a git repository
                            
                                How to create a new unknown or dynamic/expando object in Python
                            
                                How can I convert a python urandom to a string?
                            
                                How to convert string values from a dictionary, into int/float datatypes?
                            
                                What is this kind of assignment in Python called? a = b = True
                            
                                Python and Powers Math
                            
                                Delete all objects in a list
                            
                                How to get all combination of n binary value? [duplicate]
                            
                                Pandas: Why are double brackets needed to select column after boolean indexing
                            
                                Is there a more Pythonic way to combine an Else: statement and an Except:?
                            
                                add a row at top in pandas dataframe [duplicate]
                            
                                Python Disk-Based Dictionary
                            
                                How to get output from subprocess.Popen(). proc.stdout.readline() blocks, no data prints out
                            
                                How to write an empty indentation block in Python?
                            
                                Use of True, False, and None as return values in Python functions
                            
                                How to extract text and text coordinates from a PDF file?
                            
                                Making a chart bigger in size
                            
                                How to build a sparkSession in Spark 2.0 using pyspark?
                            
                                How can I build a recursive function in python? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With