I have a Data frame like this one:
° item_name item_category scraping_date price
0 Michel1 Category1 2018-04-14 21.0
1 Michel1 Category1 2018-04-16 42.1
2 Michel1 Category1 2018-04-17 84.0
3 Michel1 Category1 2018-04-19 126.2
4 Michel1 Category1 2018-04-20 168.3
5 Michel1 Category2 2018-04-23 21.2
6 Michel1 Category2 2018-05-08 42.0
7 Michel1 Category2 2018-03-26 84.1
8 Michel1 Category2 2018-03-31 126.2
9 Michel1 Category2 2018-04-01 168.3
10 Michel2 Category1 2018-04-04 21.0
11 Michel2 Category1 2018-04-05 42.1
12 Michel2 Category1 2018-04-09 84.2
13 Michel2 Category1 2018-04-11 126.3
14 Michel2 Category1 2018-04-12 168.4
15 Michel2 Category2 2018-04-13 21.0
16 Michel2 Category2 2018-05-03 42.1
17 Michel2 Category2 2018-04-25 84.2
18 Michel2 Category2 2018-04-28 126.3
19 Michel2 Category2 2018-04-29 168.4
I would like to group by Item name and category, resample by week and have the average of price per week. Finally, I would like to output the date in a dict like this:
[
{
"item_name": "Michel1",
"item_category": "Category1",
"prices": [
{"week": "1", "average": "84.2"},
{"week": "2", "average": "84.2"}
]
},
{
"item_name": "Michel1",
"item_category": "Category2",
"prices": [
{"week": "1", "average": "84.2"},
{"week": "2", "average": "84.2"}
]
},....
]
I came with something to group by and have the average but I can not transform it into a dict:
df["price"] = df["price"].astype(float)
df["scraping_date"] = pd.to_datetime(df["scraping_date"])
df.set_index("scraping_date").groupby(["item_name","item_category"])["price"].resample("W").mean()
If I do a .to_dict()
, I get this, which is nearly not what I want:
{('Michel1', 'Category1', Timestamp('2017-12-03 00:00:00')): 20.0,
('Michel1', 'Category1', Timestamp('2017-12-10 00:00:00')): 20.0,
('Michel1', 'Category2', Timestamp('2017-12-17 00:00:00')): 20.0,
('Michel1', 'Category2', Timestamp('2017-12-24 00:00:00')): 20.0,
('Michel2', 'Category1', Timestamp('2017-12-31 00:00:00')): 20.0,
('Michel2', 'Category1', Timestamp('2018-01-07 00:00:00')): 20.0,
}
Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.
Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
I can not guarantee the speed , By using group by with apply
df['Week']=pd.to_datetime(df.scraping_date).dt.week
df.groupby(['item_name','item_category']).apply(lambda x : x.groupby(['Week']).price.mean().to_frame('average')
.reset_index().to_dict('r')).to_frame('price').reset_index().to_dict('r')
Out[51]:
[{'item_category': 'Category1',
'item_name': 'Michel1',
'price': [{'Week': 15.0, 'average': 21.0},
{'Week': 16.0, 'average': 105.15}]},
{'item_category': 'Category2',
'item_name': 'Michel1',
'price': [{'Week': 13.0, 'average': 126.2},
{'Week': 17.0, 'average': 21.2},
{'Week': 19.0, 'average': 42.0}]},
{'item_category': 'Category1',
'item_name': 'Michel2',
'price': [{'Week': 14.0, 'average': 31.55},
{'Week': 15.0, 'average': 126.3}]},
{'item_category': 'Category2',
'item_name': 'Michel2',
'price': [{'Week': 15.0, 'average': 21.0},
{'Week': 17.0, 'average': 126.3},
{'Week': 18.0, 'average': 42.1}]}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With