summing two columns in a pandas dataframe

People also ask

How do I sum two columns in pandas DataFrame?

Pandas: Sum values in two different columns using loc[] as assign as a new column. We selected the columns 'Jan' & 'Feb' using loc[] and got a mini dataframe which contains only these two columns. Then called the sum() with axis=1, which added the values in all the columns and returned a Series object.

How do I sum a Pandas DataFrame?

Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

I think you've misunderstood some python syntax, the following does two assignments:

In [11]: a = b = 1

In [12]: a
Out[12]: 1

In [13]: b
Out[13]: 1

So in your code it was as if you were doing:

sum = df['budget'] + df['actual']  # a Series
# and
df['variance'] = df['budget'] + df['actual']  # assigned to a column

The latter creates a new column for df:

In [21]: df
Out[21]:
  cluster                 date  budget  actual
0       a  2014-01-01 00:00:00   11000   10000
1       a  2014-02-01 00:00:00    1200    1000
2       a  2014-03-01 00:00:00     200     100
3       b  2014-04-01 00:00:00     200     300
4       b  2014-05-01 00:00:00     400     450
5       c  2014-06-01 00:00:00     700    1000
6       c  2014-07-01 00:00:00    1200    1000
7       c  2014-08-01 00:00:00     200     100
8       c  2014-09-01 00:00:00     200     300

In [22]: df['variance'] = df['budget'] + df['actual']

In [23]: df
Out[23]:
  cluster                 date  budget  actual  variance
0       a  2014-01-01 00:00:00   11000   10000     21000
1       a  2014-02-01 00:00:00    1200    1000      2200
2       a  2014-03-01 00:00:00     200     100       300
3       b  2014-04-01 00:00:00     200     300       500
4       b  2014-05-01 00:00:00     400     450       850
5       c  2014-06-01 00:00:00     700    1000      1700
6       c  2014-07-01 00:00:00    1200    1000      2200
7       c  2014-08-01 00:00:00     200     100       300
8       c  2014-09-01 00:00:00     200     300       500

As an aside, you shouldn't use sum as a variable name as the overrides the built-in sum function.

df['variance'] = df.loc[:,['budget','actual']].sum(axis=1)

Same thing can be done using lambda function. Here I am reading the data from a xlsx file.

import pandas as pd
df = pd.read_excel("data.xlsx", sheet_name = 4)
print df

Output:

  cluster Unnamed: 1      date  budget  actual
0       a 2014-01-01  00:00:00   11000   10000
1       a 2014-02-01  00:00:00    1200    1000
2       a 2014-03-01  00:00:00     200     100
3       b 2014-04-01  00:00:00     200     300
4       b 2014-05-01  00:00:00     400     450
5       c 2014-06-01  00:00:00     700    1000
6       c 2014-07-01  00:00:00    1200    1000
7       c 2014-08-01  00:00:00     200     100
8       c 2014-09-01  00:00:00     200     300

Sum two columns into 3rd new one.

df['variance'] = df.apply(lambda x: x['budget'] + x['actual'], axis=1)
print df

Output:

  cluster Unnamed: 1      date  budget  actual  variance
0       a 2014-01-01  00:00:00   11000   10000     21000
1       a 2014-02-01  00:00:00    1200    1000      2200
2       a 2014-03-01  00:00:00     200     100       300
3       b 2014-04-01  00:00:00     200     300       500
4       b 2014-05-01  00:00:00     400     450       850
5       c 2014-06-01  00:00:00     700    1000      1700
6       c 2014-07-01  00:00:00    1200    1000      2200
7       c 2014-08-01  00:00:00     200     100       300
8       c 2014-09-01  00:00:00     200     300       500

You could also use the .add() function:

 df.loc[:,'variance'] = df.loc[:,'budget'].add(df.loc[:,'actual'])

This is the most elegant solution which follows DRY and work absolutely great.

dataframe_name['col1', 'col2', 'col3'].sum(axis = 1, skipna = True)

Thank you.

If "budget" has any NaN values but you don't want it to sum to NaN then try:

def fun (b, a):
    if math.isnan(b):
        return a
    else:
        return b + a

f = np.vectorize(fun, otypes=[float])

df['variance'] = f(df['budget'], df_Lp['actual'])

Related questions
                            
                                Pandas equivalent of Oracle Lead/Lag function
                            
                                What exactly does "import *" import?
                            
                                Form sending error, Flask
                            
                                Is a Python Decorator the same as Java annotation, or Java with Aspects?
                            
                                How to install a module for all users with pip on linux?
                            
                                OrderedDict for older versions of python
                            
                                Python elegant inverse function of int(string,base)
                            
                                using Flask and Tornado together?
                            
                                How does the order of mixins affect the derived class?
                            
                                Fill cells with colors using openpyxl?
                            
                                Pandas DataFrame Add column to index without resetting
                            
                                How to I display why some tests where skipped while using py.test?
                            
                                Running an Excel macro via Python?
                            
                                Why isn't .ico file defined when setting window's icon?
                            
                                How to update the image of a Tkinter Label widget?
                            
                                How do I add a title and axis labels to Seaborn Heatmap?
                            
                                how to add a coroutine to a running asyncio loop?
                            
                                How can I check for unused import in many Python files?
                            
                                Suppressing scientific notation in pandas?
                            
                                How to make a custom activation function with only Python in Tensorflow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

summing two columns in a pandas dataframe

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us