When I am using Pandas, I have a problem. My task is like this:
df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
a b c d e f
0 1 2 3 4 5 6
1 1 2 3 4 5 6
2 1 2 3 4 5 6
what I want to do is the output dataframe looks like this:
Out:
s1 s2 s3
0 3 7 11
1 3 7 11
2 3 7 11
That is to say, sum the column (a,b),(c,d),(e,f) separately and rename the result columns names as (s1,s2,s3). Could anyone help solve this problem in Pandas? Thank you so much.
Using DataFrame. sum() to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows. Yields below output. The above example calculates the sum of all numeric columns for each row.
Sum of two columns The columns whose sum has to be calculated can be called through the $ operator and then we can perform the sum of two dataframe columns by using “+” operator.
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
1) Perform groupby
w.r.t columns by supplying axis=1
. Per @Boud's comment, you exactly get what you want with a minor tweak in the grouping array:
df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')
Grouping gets performed according to this condition:
np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)
2) Use np.add.reduceat
which is a faster alternative:
df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')
Timing Constraints:
For a DF
of 1 million rows spanned over 20 columns:
from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)
def with_groupby(df):
return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')
def with_reduceat(df):
df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
return df.add_prefix('s')
# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True
%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop
%timeit with_reduceat(df.copy()) # <--- (>3X faster)
1 loop, best of 3: 345 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With