I have this DataFrame:
df = pd.DataFrame({'site': ['a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'a'],
'day': [1, 1, 1, 1, 1, 1, 2, 2, 2],
'hour': [1, 2, 3, 1, 2, 3, 1, 2, 3],
'clicks': [100, 200, 50, 0, 20, 30, 10, 0, 20]})
# site day hour clicks
# 0 a 1 1 100
# 1 a 1 2 200
# 2 a 1 3 50
# 3 b 1 1 0
# 4 b 1 2 20
# 5 b 1 3 30
# 6 a 2 1 10
# 7 a 2 2 0
# 8 a 2 3 20
What I want to achieve: Group these data by 'site' and 'day' and add a field (has_hour_1_clicks) that indicates if for that site/day the value of clicks at 'hour'==1 was more than 0. I understand that for the grouping I basically need this:
df.groupby(['site', 'day'])
But I can't find a way to get what I want with transform(), filter() or aggr().
So the DataFrame I want to get out of this would look like this:
# site day hour clicks has_hour_1_clicks
# 0 a 1 1 100 True
# 1 a 1 2 200 True
# 2 a 1 3 50 True
# 3 b 1 1 0 False
# 4 b 1 2 20 False
# 5 b 1 3 30 False
# 6 a 2 1 10 True
# 7 a 2 2 0 True
# 8 a 2 3 20 True
This looks like a job for GroupBy.transform:
(df.eval('has_clicks = hour == 1 and clicks > 0')
.groupby(['site', 'day'])['has_clicks']
.transform('any'))
0 True
1 True
2 True
3 False
4 False
5 False
6 True
7 True
8 True
Name: has_clicks, dtype: bool
How It Works
First, find rows that match your condition:
# eval statement is doing exactly this, but more succinctly
df.assign(has_clicks=(df['hour'] == 1) & (df['clicks'] > 0))
site day hour clicks has_clicks
0 a 1 1 100 True
1 a 1 2 200 False
2 a 1 3 50 False
3 b 1 1 0 False
4 b 1 2 20 False
5 b 1 3 30 False
6 a 2 1 10 True
7 a 2 2 0 False
8 a 2 3 20 False
Next, take the "has_clicks" column, and transform all rows in that group to "True" if any of the rows in that group is true. This is what transform('any') is doing:
_.groupby(['site', 'day'])['has_clicks'].transform('any')
0 True
1 True
2 True
3 False
4 False
5 False
6 True
7 True
8 True
Name: has_clicks, dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With