Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum of Every Two Columns in Pandas dataframe

When I am using Pandas, I have a problem. My task is like this:

df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
    a b c d e f
0   1 2 3 4 5 6
1   1 2 3 4 5 6 
2   1 2 3 4 5 6

what I want to do is the output dataframe looks like this:

Out:
    s1   s2   s3
0   3    7    11
1   3    7    11
2   3    7    11

That is to say, sum the column (a,b),(c,d),(e,f) separately and rename the result columns names as (s1,s2,s3). Could anyone help solve this problem in Pandas? Thank you so much.

like image 240
spind Avatar asked Nov 17 '16 17:11

spind


People also ask

How do you get the sum of two columns in pandas?

Using DataFrame. sum() to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows. Yields below output. The above example calculates the sum of all numeric columns for each row.

How do you find the sum of two columns in a data frame?

Sum of two columns The columns whose sum has to be calculated can be called through the $ operator and then we can perform the sum of two dataframe columns by using “+” operator.

How do I get the sum of all columns in pandas?

Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do I sum two columns in Groupby pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.


1 Answers

1) Perform groupby w.r.t columns by supplying axis=1. Per @Boud's comment, you exactly get what you want with a minor tweak in the grouping array:

df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

enter image description here

Grouping gets performed according to this condition:

np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)

2) Use np.add.reduceat which is a faster alternative:

df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')

enter image description here

Timing Constraints:

For a DF of 1 million rows spanned over 20 columns:

from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)

def with_groupby(df):
    return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

def with_reduceat(df):
    df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
    df.columns = df.columns + 1
    return df.add_prefix('s')

# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True

%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop

%timeit with_reduceat(df.copy())   # <--- (>3X faster)
1 loop, best of 3: 345 ms per loop
like image 166
Nickil Maveli Avatar answered Oct 02 '22 11:10

Nickil Maveli