Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding two Series with NaNs

Tags:

I'm working through the "Python For Data Analysis" and I don't understand a particular functionality. Adding two pandas series objects will automatically align the indexed data but if one object does not contain that index it is returned as NaN. For example from book:

a = Series([35000,71000,16000,5000],index=['Ohio','Texas','Oregon','Utah']) b = Series([NaN,71000,16000,35000],index=['California', 'Texas', 'Oregon', 'Ohio']) 

Result:

    In [63]: a     Out[63]: Ohio          35000              Texas         71000              Oregon        16000              Utah           5000     In [64]: b     Out[64]: California      NaN              Texas         71000              Oregon        16000              Ohio          35000 

When I add them together I get this...

    In [65]: a+b     Out[65]: California       NaN              Ohio           70000              Oregon         32000              Texas         142000              Utah             NaN 

So why is the Utah value NaN and not 500? It seems that 500+NaN=500. What gives? I'm missing something, please explain.

Update:

    In [92]: # fill NaN with zero              b = b.fillna(0)              b     Out[92]: California        0              Texas         71000              Oregon        16000              Ohio          35000      In [93]: a     Out[93]: Ohio      35000              Texas     71000              Oregon    16000              Utah       5000      In [94]: # a is still good              a+b     Out[94]: California       NaN              Ohio           70000              Oregon         32000              Texas         142000               Utah             NaN 
like image 416
BubbleGuppies Avatar asked Apr 24 '13 21:04

BubbleGuppies


People also ask

Which method is used to add two series?

Calling add() function on a Series instance by passing another Series instance as the parameter, produces a new Series instance which has the elements of both the series added up. In the same way to add elements of two pandas DataFrame instances, the DataFrame. add() method can be used.

Does Python sum ignore NaN?

sum() Method to Find the Sum Ignoring NaN Values. Use the default value of the skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values. If you set skipna=True , you'll get NaN values of sums if the DataFrame has NaN values.


Video Answer


2 Answers

Pandas does not assume that 500+NaN=500, but it is easy to ask it to do that:

a.add(b, fill_value=0) 
like image 119
Dan Allan Avatar answered Sep 23 '22 08:09

Dan Allan


The default approach is to assume that any computation involving NaN gives NaN as the result. Anything plus NaN is NaN, anything divided by NaN is NaN, etc. If you want to fill the NaN with some value, you have to do that explicitly (as Dan Allan showed in his answer).

like image 43
BrenBarn Avatar answered Sep 24 '22 08:09

BrenBarn