Pandas: Sum of the Max 3 Column Values in Each Row

Tags:

pandas

Sample data:

        0        1         2        3       4  Sum_max_3
0  591949  2575703  22479693  2202865  499835   27258261
1    2705    11426    339913     5438    1016     356777
2      18      119      4162       18       0       4299
3     264     1213     14999      246     116      16476
4       0       35      1292       10       0       1337
5       0        0      1442        0       0       1442
6       0       28      5596       20       0       5644
7       0       10       102       56       0        168
8      33        0      1224       17       0       1274
9      39      198      9505       62      35       9765

I want to get the sum of the 3 columns with the largest values in the row. These are different columns for every row (Sum_max_3).

I have many many columns so I need to do this automatically for all of them.

222

asked Sep 16 '18 20:09

2 Answers

Use the underlying np array (using df.values), sort each row using np.sort, and sum the last 3 columns (which will be the 3 maximum values for each row):

>>> df
        0        1         2        3       4
0  591949  2575703  22479693  2202865  499835
1    2705    11426    339913     5438    1016
2      18      119      4162       18       0
3     264     1213     14999      246     116
4       0       35      1292       10       0
5       0        0      1442        0       0
6       0       28      5596       20       0
7       0       10       102       56       0
8      33        0      1224       17       0
9      39      198      9505       62      35


>>> df['Sum_max_3'] = np.sum(np.sort(df.values)[:,-3:],axis=1)

>>> df
        0        1         2        3       4  Sum_max_3
0  591949  2575703  22479693  2202865  499835   27258261
1    2705    11426    339913     5438    1016     356777
2      18      119      4162       18       0       4299
3     264     1213     14999      246     116      16476
4       0       35      1292       10       0       1337
5       0        0      1442        0       0       1442
6       0       28      5596       20       0       5644
7       0       10       102       56       0        168
8      33        0      1224       17       0       1274
9      39      198      9505       62      35       9765

answered Sep 29 '22 09:09

You should use numpy.partition to avoid a full sort. This will reduce the time complexity of finding the three maximum values from O(nlogn) to O(n), which will make a massive difference on large DataFrames:

np.sum(np.partition(df.values, -3)[:, -3:], 1)

df.assign(max3=np.sum(np.partition(df.values, -3)[:, -3:], 1))

        0        1         2        3       4      max3
0  591949  2575703  22479693  2202865  499835  27258261
1    2705    11426    339913     5438    1016    356777
2      18      119      4162       18       0      4299
3     264     1213     14999      246     116     16476
4       0       35      1292       10       0      1337
5       0        0      1442        0       0      1442
6       0       28      5596       20       0      5644
7       0       10       102       56       0       168
8      33        0      1224       17       0      1274
9      39      198      9505       62      35      9765

Timings

In [411]: df = pd.DataFrame(np.random.rand(5000, 5000))

In [412]: %timeit np.sum(np.sort(df.values)[:,-3:],axis=1)
1.69 s ± 92.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [413]: %timeit np.sum(np.partition(df.values, -3)[:, -3:], 1)
364 ms ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

answered Sep 29 '22 09:09

user3483203

Related questions
                            
                                command 'cc' failed with exit status 1 on OSX High Sierra
                            
                                Can I pip install python3.6?
                            
                                Django - ManyRelatedManager object is not iterable when returning Object
                            
                                Resampling a signal with scipy.signal.resample
                            
                                How to make Django sessionId cookie as secure
                            
                                What is the Python equivalent of CPP reinterpret_cast
                            
                                ImportError: cannot import name 'get_default_renderer'
                            
                                Django Rest Framework: HTTP 401 Unauthorized error
                            
                                PonyORM - multiple model files
                            
                                Python 2 Max Function
                            
                                How can I limit regression output between 0 to 1 in keras
                            
                                pyenv-virtualenv: `3.6.4' is not installed in pyenv
                            
                                Performance comparison Static Typing Python 3.6+ vs Cython
                            
                                Message "Exception ignored" when dealing pandas.datetime type
                            
                                How to use He initialization in TensorFlow
                            
                                AWS Rekognition detect label Invalid image encoding error
                            
                                Django: filter queryset by multiple ID
                            
                                Python: Pyppeteer clicking on pop up window
                            
                                Merging multiple bands together through gdal...correctly
                            
                                repeating the rows of a data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Sum of the Max 3 Column Values in Each Row

Tags:

python

pandas

user1581390

People also ask

2 Answers

sacuL

user3483203

Recent Activity

Donate For Us