Treating NaN as zero in arithmetic operations?

Tags:

Here's a simple example of the sort of thing I'm wrestling with:

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: test = pd.DataFrame(np.random.randn(4,4),columns=list('ABCD'))
In [4]: for i in range(4):
  ....:    test.iloc[i,i] = np.nan

In [5]: test
Out[5]:
           A         B         C         D
0        NaN  0.136841 -0.854138 -1.890888
1  -1.261724       NaN  0.875647  1.312823
2   1.130999 -0.208402       NaN  0.256644
3  -0.158458 -0.305250  0.902756       NaN

Now, if I use sum to sum the rows, all the NaN values are treated as zeros:

In [6]: test['Sum'] = test.loc[:,'A':'D'].sum(axis=1)

In [7]: test
Out[7]: 
          A         B         C         D       Sum
0       NaN  0.136841 -0.854138 -1.890888 -2.608185
1 -1.261724       NaN  0.875647  1.312823  0.926745
2  1.130999 -0.208402       NaN  0.256644  1.179241
3 -0.158458 -0.305250  0.902756       NaN  0.439048

But in my case, I may need to do a bit of work on the values first; for example scaling them:

In [8]: test['Sum2'] = test.A + test.B/2 - test.C/3 + test.D

In [9]: test
Out[9]: 
          A         B         C         D       Sum  Sum2
0       NaN  0.136841 -0.854138 -1.890888 -2.608185   NaN
1 -1.261724       NaN  0.875647  1.312823  0.926745   NaN
2  1.130999 -0.208402       NaN  0.256644  1.179241   NaN
3 -0.158458 -0.305250  0.902756       NaN  0.439048   NaN

As you see, the NaN values carry across into the arithmetic to produce NaN output, which is what you'd expect.

Now, I don't want to replace all NaN values in my dataframe with zeros: it is helpful to me to distinguish between zero and NaN. I could replace NaN with something else: I'm dealing with large volumes of student grades, and i need to distinguish between a grade of zero, and a NaN which at the moment I'm using to indicate that the particular assessment task was not attempted. (It takes the place of what would be a blank cell in a traditional spreadsheet.) But whatever I replace the NaN values with, it needs to be something that can be treated as zero in the operations I may perform. What are my options here?

371

asked Dec 02 '17 09:12

Alasdair

2 Answers

Use the fillna function

test['Sum2'] = test.A.fillna(0) + test.B.fillna(0)/2 - test.C.fillna(0)/3 + test.D.fillna(0)

answered Oct 29 '22 14:10

kmcodes

If the dataframe is not huge you can try:

test["Sum"] = test.sum(axis=1)
test2 = test.fillna(0)
test["Sum2"] = test2.A + test2.B/2 - test2.C/3 + test2.D
del test2

It will be interesting to know if there is a way to do the second sum in one line only.

Update

if you have 1e5 rows or less the method I suggested is slightly faster than the one suggested by kmcodes, then things changes.

n = int(1e5)
test = pd.DataFrame(np.random.randn(n,4),columns=list('ABCD'))
for i in range(4):
    test.iloc[i,i] = np.nan

%%timeit
test2 = test.fillna(0)
test["Sum2"] = test2.A + test2.B/2 - test2.C/3 + test2.D
del test2
3.95 ms ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
test['Sum2'] = test.A.fillna(0) + test.B.fillna(0)/2 - test.C.fillna(0)/3 + test.D.fillna(0)
4.12 ms ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Update 2

I found this

In your case you can just

weights = [1, 1/2, -1/3, 1]
test["Sum2"] = test.fillna(0).mul(weights).sum(axis=1)

keep in mind that this seems to be consistently slower than the other two.

answered Oct 29 '22 16:10

rpanai

Related questions
                            
                                Python attrs - positional attribute in super class while optional in sub class
                            
                                SQLalchemy query get aggregate array of dicts
                            
                                Randomly select unique row from dataframe in Pandas
                            
                                On Raspbian run multiple python Versions simultaniously
                            
                                How can I access the raw documents from the Brown corpus?
                            
                                How to make JavaScript and Python work together?
                            
                                Plot table alongside sns.barplot
                            
                                Django CreateView - Display only particular objects in foreignkey field
                            
                                concatenate two arrays in python with alternating the columns in numpy
                            
                                How to change color of QTextEdit widget
                            
                                Connecting to SQL server from SQLAlchemy using odbc_connect
                            
                                Django Form Request.GET only get 1 value from multiple selected field
                            
                                How to make a slice of DataFrame and "fillna" in specific slice using Python Pandas?
                            
                                Python Pandas: calculate rolling mean (moving average) over variable number of rows
                            
                                How can I use an animated gif that will play when pressed as a button in kivy?
                            
                                gcloud ml-engine returns error on large files
                            
                                Add element at the start of array and delete at the end numpy
                            
                                Pandas.plotting doesn't show graph
                            
                                Using a normal function f(x) in python
                            
                                Get the column names of a python numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Treating NaN as zero in arithmetic operations?

Tags:

python

pandas

nan

Alasdair

People also ask

2 Answers

kmcodes

rpanai

Recent Activity

Donate For Us