When I am using Pandas, I have a problem. My task is like this: <pre class="prettyprint"><code>df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f']) Out: a b c d e f 0 1 2 3 4 5 6 1 1 2 3 4 5 6 2 1 2 3 4 5 6 </code></pre> what I want to do is the output dataframe looks like this: <pre class="prettyprint"><code>Out: s1 s2 s3 0 3 7 11 1 3 7 11 2 3 7 11 </code></pre> That is to say, sum the column (a,b),(c,d),(e,f) separately and rename the result columns names as (s1,s2,s3). Could anyone help solve this problem in Pandas? Thank you so much.

1) Perform <code>groupby</code> w.r.t columns by supplying <code>axis=1</code>. Per @Boud's comment, you exactly get what you want with a minor tweak in the grouping array: <pre class="prettyprint"><code>df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s') </code></pre> <img src="https://i.stack.imgur.com/GqneP.png" alt="enter image description here"> Grouping gets performed according to this condition: <pre class="prettyprint"><code>np.arange(len(df.columns)) // 2 # array([0, 0, 1, 1, 2, 2], dtype=int32) </code></pre> <hr> 2) Use <code>np.add.reduceat</code> which is a faster alternative: <pre class="prettyprint"><code>df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1)) df.columns = df.columns + 1 df.add_prefix('s') </code></pre> <img src="https://i.stack.imgur.com/GqneP.png" alt="enter image description here"> Timing Constraints: For a <code>DF</code> of 1 million rows spanned over 20 columns: <pre class="prettyprint"><code>from string import ascii_lowercase np.random.seed(42) df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20])) df.shape (1000000, 20) def with_groupby(df): return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s') def with_reduceat(df): df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1)) df.columns = df.columns + 1 return df.add_prefix('s') # test whether they give the same o/p with_groupby(df).equals(with_groupby(df)) True %timeit with_groupby(df.copy()) 1 loop, best of 3: 1.11 s per loop %timeit with_reduceat(df.copy()) # <--- (>3X faster) 1 loop, best of 3: 345 ms per loop </code></pre>

Sum of Every Two Columns in Pandas dataframe

Tags:

python

pandas

dataframe

When I am using Pandas, I have a problem. My task is like this:

df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
    a b c d e f
0   1 2 3 4 5 6
1   1 2 3 4 5 6 
2   1 2 3 4 5 6

what I want to do is the output dataframe looks like this:

Out:
    s1   s2   s3
0   3    7    11
1   3    7    11
2   3    7    11

That is to say, sum the column (a,b),(c,d),(e,f) separately and rename the result columns names as (s1,s2,s3). Could anyone help solve this problem in Pandas? Thank you so much.

240

asked Nov 17 '16 17:11

spind

1 Answers

1) Perform groupby w.r.t columns by supplying axis=1. Per @Boud's comment, you exactly get what you want with a minor tweak in the grouping array:

df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

enter image description here

Grouping gets performed according to this condition:

np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)

2) Use np.add.reduceat which is a faster alternative:

df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')

enter image description here

Timing Constraints:

For a DF of 1 million rows spanned over 20 columns:

from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)

def with_groupby(df):
    return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

def with_reduceat(df):
    df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
    df.columns = df.columns + 1
    return df.add_prefix('s')

# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True

%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop

%timeit with_reduceat(df.copy())   # <--- (>3X faster)
1 loop, best of 3: 345 ms per loop

166

answered Oct 02 '22 11:10

Nickil Maveli

Related questions
                            
                                How to implement search function in django?
                            
                                Access a list within an element of a Pandas DataFrame
                            
                                How to multicolour text with ScrolledText widget?
                            
                                pdfkit not converting image to pdf
                            
                                Compare similarity between names
                            
                                Validate that a WTForms BooleanField is checked
                            
                                How to save files to database in django
                            
                                Setting Max Results in API v4 (python)
                            
                                Sentinel object and its applications?
                            
                                Capitalizing hyphenated name
                            
                                Django - request.session not being saved
                            
                                Number duplicates sequentially in Pandas DataFrame
                            
                                Pandas convert datetime with a separate time zone column
                            
                                How to make a new filter and apply it on an image using cv2 in python2.7?
                            
                                Pandas crosstab, but with values from aggregation of third column
                            
                                How to dump a dictionary into an .xlsx file with proper column alignment?
                            
                                Renaming python.exe to python3.exe for co-existence with python2 on Windows
                            
                                Summation of elements of dictionary that are list of lists
                            
                                Adding attachment to Slackbot
                            
                                How to use numpy to get the cumulative count by unique values in linear time?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With