If I add two columns to create a third, any columns containing NaN (representing missing data in my world) cause the resulting output column to be NaN as well. Is there a way to skip NaNs without explicitly setting the values to 0 (which would lose the notion that those values are "missing")?
In [42]: frame = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 4]}) In [44]: frame['c'] = frame['a'] + frame['b'] In [45]: frame Out[45]: a b c 0 1 3 4 1 2 NaN NaN 2 NaN 4 NaN
In the above, I would like column c to be [4, 2, 4].
Thanks...
sum() Method to Find the Sum Ignoring NaN Values. Use the default value of the skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values. If you set skipna=True , you'll get NaN values of sums if the DataFrame has NaN values.
Use dropna() function to drop rows with NaN / None values in pandas DataFrame.
Pandas: Sum values in two different columns using loc[] as assign as a new column. We selected the columns 'Jan' & 'Feb' using loc[] and got a mini dataframe which contains only these two columns. Then called the sum() with axis=1, which added the values in all the columns and returned a Series object.
In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.
with fillna()
frame['c'] = frame.fillna(0)['a'] + frame.fillna(0)['b']
or as suggested :
frame['c'] = frame.a.fillna(0) + frame.b.fillna(0)
giving :
a b c 0 1 3 4 1 2 NaN 2 2 NaN 4 4
Another approach:
>>> frame["c"] = frame[["a", "b"]].sum(axis=1) >>> frame a b c 0 1 3 4 1 2 NaN 2 2 NaN 4 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With