So my dataframe looks like this:
date site country score 0 2018-01-01 google us 100 1 2018-01-01 google ch 50 2 2018-01-02 google us 70 3 2018-01-03 google us 60 4 2018-01-02 google ch 10 5 2018-01-01 fb us 50 6 2018-01-02 fb us 55 7 2018-01-03 fb us 100 8 2018-01-01 fb es 100 9 2018-01-02 fb gb 100
Each site
has a different score depending on the country
. I'm trying to find the 1/3/5-day difference of score
s for each site
/country
combination.
Output should be:
date site country score diff 8 2018-01-01 fb es 100 0.0 9 2018-01-02 fb gb 100 0.0 5 2018-01-01 fb us 50 0.0 6 2018-01-02 fb us 55 5.0 7 2018-01-03 fb us 100 45.0 1 2018-01-01 google ch 50 0.0 4 2018-01-02 google ch 10 -40.0 0 2018-01-01 google us 100 0.0 2 2018-01-02 google us 70 -30.0 3 2018-01-03 google us 60 -10.0
I first tried sorting by site
/country
/date
, then grouping by site
and country
but I'm not able to wrap my head around getting a difference from a grouped object.
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
groupby() to Iterate over Data frame Groups. DataFrame. groupby() function in Python is used to split the data into groups based on some criteria.
What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.
First, sort the DataFrame and then all you need is groupby.diff()
:
df = df.sort_values(by=['site', 'country', 'date']) df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0) df Out: date site country score diff 8 2018-01-01 fb es 100 0.0 9 2018-01-02 fb gb 100 0.0 5 2018-01-01 fb us 50 0.0 6 2018-01-02 fb us 55 5.0 7 2018-01-03 fb us 100 45.0 1 2018-01-01 google ch 50 0.0 4 2018-01-02 google ch 10 -40.0 0 2018-01-01 google us 100 0.0 2 2018-01-02 google us 70 -30.0 3 2018-01-03 google us 60 -10.0
sort_values
doesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With