Pandas groupby multiple fields then diff

Tags:

So my dataframe looks like this:

         date    site country  score 0  2018-01-01  google      us    100 1  2018-01-01  google      ch     50 2  2018-01-02  google      us     70 3  2018-01-03  google      us     60 4  2018-01-02  google      ch     10 5  2018-01-01      fb      us     50 6  2018-01-02      fb      us     55 7  2018-01-03      fb      us    100 8  2018-01-01      fb      es    100 9  2018-01-02      fb      gb    100

Each site has a different score depending on the country. I'm trying to find the 1/3/5-day difference of scores for each site/country combination.

Output should be:

          date    site country  score  diff 8  2018-01-01      fb      es    100   0.0 9  2018-01-02      fb      gb    100   0.0 5  2018-01-01      fb      us     50   0.0 6  2018-01-02      fb      us     55   5.0 7  2018-01-03      fb      us    100  45.0 1  2018-01-01  google      ch     50   0.0 4  2018-01-02  google      ch     10 -40.0 0  2018-01-01  google      us    100   0.0 2  2018-01-02  google      us     70 -30.0 3  2018-01-03  google      us     60 -10.0

I first tried sorting by site/country/date, then grouping by site and country but I'm not able to wrap my head around getting a difference from a grouped object.

416

asked Jan 19 '18 18:01

Craig

1 Answers

First, sort the DataFrame and then all you need is groupby.diff():

df = df.sort_values(by=['site', 'country', 'date'])  df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)  df Out:           date    site country  score  diff 8  2018-01-01      fb      es    100   0.0 9  2018-01-02      fb      gb    100   0.0 5  2018-01-01      fb      us     50   0.0 6  2018-01-02      fb      us     55   5.0 7  2018-01-03      fb      us    100  45.0 1  2018-01-01  google      ch     50   0.0 4  2018-01-02  google      ch     10 -40.0 0  2018-01-01  google      us    100   0.0 2  2018-01-02  google      us     70 -30.0 3  2018-01-03  google      us     60 -10.0

sort_values doesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.

answered Sep 20 '22 23:09

ayhan

Related questions
                            
                                What is the difference between shlex.split() and re.split()?
                            
                                Python regex match space only
                            
                                Building a row from a dict in pySpark
                            
                                Cumsum as a new column in an existing Pandas data
                            
                                Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools [duplicate]
                            
                                How to kill process on GPUs with PID in nvidia-smi using keyword?
                            
                                Activate conda environment in docker
                            
                                Importing Python modules from different working directory
                            
                                Adding custom fields to users in Django
                            
                                Django - get HTML output into a variable
                            
                                Does PyGame do 3d?
                            
                                link several Popen commands with pipes
                            
                                cProfile for Python does not recognize Function name
                            
                                How to insert blank line using reStructuredText / Sphinx [duplicate]
                            
                                Update method in Python dictionary
                            
                                numpy, how do I find total rows in a 2D array and total column in a 1D array
                            
                                What's the correct way to set up Django translation?
                            
                                Django rest framework override page_size in ViewSet
                            
                                Purpose of return self python
                            
                                How can I extract the nth row of a pandas data frame as a pandas data frame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas groupby multiple fields then diff

Tags:

python

pandas

dataframe

group-by

Craig

People also ask

1 Answers

ayhan

Recent Activity

Donate For Us