So my dataframe looks like this:
         date    site country  score 0  2018-01-01  google      us    100 1  2018-01-01  google      ch     50 2  2018-01-02  google      us     70 3  2018-01-03  google      us     60 4  2018-01-02  google      ch     10 5  2018-01-01      fb      us     50 6  2018-01-02      fb      us     55 7  2018-01-03      fb      us    100 8  2018-01-01      fb      es    100 9  2018-01-02      fb      gb    100  Each site has a different score depending on the country. I'm trying to find the 1/3/5-day difference of scores for each site/country combination.
Output should be:
          date    site country  score  diff 8  2018-01-01      fb      es    100   0.0 9  2018-01-02      fb      gb    100   0.0 5  2018-01-01      fb      us     50   0.0 6  2018-01-02      fb      us     55   5.0 7  2018-01-03      fb      us    100  45.0 1  2018-01-01  google      ch     50   0.0 4  2018-01-02  google      ch     10 -40.0 0  2018-01-01  google      us    100   0.0 2  2018-01-02  google      us     70 -30.0 3  2018-01-03  google      us     60 -10.0  I first tried sorting by site/country/date, then grouping by site and country but I'm not able to wrap my head around getting a difference from a grouped object.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
groupby() to Iterate over Data frame Groups. DataFrame. groupby() function in Python is used to split the data into groups based on some criteria.
What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.
First, sort the DataFrame and then all you need is groupby.diff():
df = df.sort_values(by=['site', 'country', 'date'])  df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)  df Out:           date    site country  score  diff 8  2018-01-01      fb      es    100   0.0 9  2018-01-02      fb      gb    100   0.0 5  2018-01-01      fb      us     50   0.0 6  2018-01-02      fb      us     55   5.0 7  2018-01-03      fb      us    100  45.0 1  2018-01-01  google      ch     50   0.0 4  2018-01-02  google      ch     10 -40.0 0  2018-01-01  google      us    100   0.0 2  2018-01-02  google      us     70 -30.0 3  2018-01-03  google      us     60 -10.0   sort_values doesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With