Pandas equivalent to SQL window functions

Tags:

Is there an idiomatic equivalent to SQL's window functions in Pandas? For example, what's the most compact way to write the equivalent of this in Pandas?:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name

Or this?:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

790

asked Jan 10 '17 16:01

2daaa

1 Answers

For the first SQL:

SELECT state_name,  
       state_population,
       SUM(state_population)
        OVER() AS national_population
FROM population   
ORDER BY state_name

Pandas:

df.assign(national_population=df.state_population.sum()).sort_values('state_name')

For the second SQL:

SELECT state_name,  
       state_population,
       region,
       SUM(state_population)
        OVER(PARTITION BY region) AS regional_population
FROM population    
ORDER BY state_name

Pandas:

df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
  .sort_values('state_name')

DEMO:

In [238]: df
Out[238]:
   region state_name  state_population
0       1        aaa               100
1       1        bbb               110
2       2        ccc               200
3       2        ddd               100
4       2        eee               100
5       3        xxx                55

national_population:

In [246]: df.assign(national_population=df.state_population.sum()).sort_values('state_name')
Out[246]:
   region state_name  state_population  national_population
0       1        aaa               100                  665
1       1        bbb               110                  665
2       2        ccc               200                  665
3       2        ddd               100                  665
4       2        eee               100                  665
5       3        xxx                55                  665

regional_population:

In [239]: df.assign(regional_population=df.groupby('region')['state_population'].transform('sum')) \
     ...:   .sort_values('state_name')
Out[239]:
   region state_name  state_population  regional_population
0       1        aaa               100                  210
1       1        bbb               110                  210
2       2        ccc               200                  400
3       2        ddd               100                  400
4       2        eee               100                  400
5       3        xxx                55                   55

116

answered Sep 28 '22 11:09

MaxU - stop WAR against UA

Related questions
                            
                                Detect centre and angle of rectangles in an image using Opencv
                            
                                Python: Find difference between two dictionaries containing lists
                            
                                Python add days in epoch time
                            
                                Machine Epsilon in Python
                            
                                scipy.stats.multivariate_normal raising `LinAlgError: singular matrix` even though my covariance matrix is invertible
                            
                                Setting up the EB CLI - error nonetype get_frozen_credentials
                            
                                How to remove an app from a django projects (and all its tables)
                            
                                How do I add custom actions to a change model form in Django Admin?
                            
                                Bilinear upsample in tensorflow?
                            
                                Minimize overhead in Python multiprocessing.Pool with numpy/scipy
                            
                                Find image type in python openCV
                            
                                Disable invalid name arguments pylint
                            
                                Return the column name(s) for a specific value in a pandas dataframe
                            
                                FFT on image with Python
                            
                                Python: convert 'days since 1990' to datetime object
                            
                                Convert a epoch timestamp to yyyy/mm/dd hh:mm
                            
                                ImportError: No module named 'cryptography'
                            
                                How to recursively query in django efficiently?
                            
                                Non-blocking file read
                            
                                How to independently set horizontal and vertical, major and minor grid lines of a plot?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas equivalent to SQL window functions

Tags:

python

sql

pandas

window-functions

2daaa

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us