Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sum two columns, skipping NaN

Tags:

python

pandas

If I add two columns to create a third, any columns containing NaN (representing missing data in my world) cause the resulting output column to be NaN as well. Is there a way to skip NaNs without explicitly setting the values to 0 (which would lose the notion that those values are "missing")?

In [42]: frame = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 4]})  In [44]: frame['c'] = frame['a'] + frame['b']  In [45]: frame Out[45]:      a   b   c 0   1   3   4 1   2 NaN NaN 2 NaN   4 NaN 

In the above, I would like column c to be [4, 2, 4].

Thanks...

like image 336
smontanaro Avatar asked Jun 24 '14 12:06

smontanaro


People also ask

Does pandas sum ignore NaN?

sum() Method to Find the Sum Ignoring NaN Values. Use the default value of the skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values. If you set skipna=True , you'll get NaN values of sums if the DataFrame has NaN values.

How do I skip NaN in pandas?

Use dropna() function to drop rows with NaN / None values in pandas DataFrame.

How do I sum two columns in pandas DataFrame?

Pandas: Sum values in two different columns using loc[] as assign as a new column. We selected the columns 'Jan' & 'Feb' using loc[] and got a mini dataframe which contains only these two columns. Then called the sum() with axis=1, which added the values in all the columns and returned a Series object.

Why am I getting NaN in pandas?

In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.


2 Answers

with fillna()

frame['c'] = frame.fillna(0)['a'] + frame.fillna(0)['b'] 

or as suggested :

frame['c'] = frame.a.fillna(0) + frame.b.fillna(0) 

giving :

    a   b  c 0   1   3  4 1   2 NaN  2 2 NaN   4  4 
like image 79
jrjc Avatar answered Oct 01 '22 12:10

jrjc


Another approach:

>>> frame["c"] = frame[["a", "b"]].sum(axis=1) >>> frame     a   b  c 0   1   3  4 1   2 NaN  2 2 NaN   4  4 
like image 24
DSM Avatar answered Oct 01 '22 12:10

DSM