Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formatting pandas dataseries grouped by two columns and resampled on third with a mean to a dict

I have a Data frame like this one:

°  item_name   item_category   scraping_date        price
0   Michel1     Category1       2018-04-14           21.0
1   Michel1     Category1       2018-04-16           42.1
2   Michel1     Category1       2018-04-17           84.0
3   Michel1     Category1       2018-04-19           126.2
4   Michel1     Category1       2018-04-20           168.3
5   Michel1     Category2       2018-04-23           21.2
6   Michel1     Category2       2018-05-08           42.0
7   Michel1     Category2       2018-03-26           84.1
8   Michel1     Category2       2018-03-31           126.2
9   Michel1     Category2       2018-04-01           168.3
10  Michel2     Category1       2018-04-04           21.0
11  Michel2     Category1       2018-04-05           42.1
12  Michel2     Category1       2018-04-09           84.2
13  Michel2     Category1       2018-04-11           126.3
14  Michel2     Category1       2018-04-12           168.4
15  Michel2     Category2       2018-04-13           21.0
16  Michel2     Category2       2018-05-03           42.1
17  Michel2     Category2       2018-04-25           84.2
18  Michel2     Category2       2018-04-28           126.3
19  Michel2     Category2       2018-04-29           168.4

I would like to group by Item name and category, resample by week and have the average of price per week. Finally, I would like to output the date in a dict like this:

[
  {
    "item_name": "Michel1",
    "item_category": "Category1", 
    "prices": [
                {"week": "1", "average": "84.2"},
                {"week": "2", "average": "84.2"}
              ]
  },
  {
    "item_name": "Michel1",
    "item_category": "Category2", 
    "prices": [
                {"week": "1", "average": "84.2"},
                {"week": "2", "average": "84.2"}
              ]
  },....
]

I came with something to group by and have the average but I can not transform it into a dict:

df["price"] = df["price"].astype(float)
df["scraping_date"] = pd.to_datetime(df["scraping_date"])
df.set_index("scraping_date").groupby(["item_name","item_category"])["price"].resample("W").mean()

If I do a .to_dict(), I get this, which is nearly not what I want:

{('Michel1', 'Category1', Timestamp('2017-12-03 00:00:00')): 20.0,
 ('Michel1', 'Category1', Timestamp('2017-12-10 00:00:00')): 20.0,
 ('Michel1', 'Category2', Timestamp('2017-12-17 00:00:00')): 20.0,
 ('Michel1', 'Category2', Timestamp('2017-12-24 00:00:00')): 20.0,
 ('Michel2', 'Category1', Timestamp('2017-12-31 00:00:00')): 20.0,
 ('Michel2', 'Category1', Timestamp('2018-01-07 00:00:00')): 20.0,
}
like image 592
a-coruble Avatar asked Jun 14 '18 13:06

a-coruble


People also ask

How do I group values in a column in pandas?

Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.

How do I convert rows to columns in pandas?

Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.


1 Answers

I can not guarantee the speed , By using group by with apply

df['Week']=pd.to_datetime(df.scraping_date).dt.week
df.groupby(['item_name','item_category']).apply(lambda x : x.groupby(['Week']).price.mean().to_frame('average')
.reset_index().to_dict('r')).to_frame('price').reset_index().to_dict('r')
Out[51]: 
[{'item_category': 'Category1',
  'item_name': 'Michel1',
  'price': [{'Week': 15.0, 'average': 21.0},
   {'Week': 16.0, 'average': 105.15}]},
 {'item_category': 'Category2',
  'item_name': 'Michel1',
  'price': [{'Week': 13.0, 'average': 126.2},
   {'Week': 17.0, 'average': 21.2},
   {'Week': 19.0, 'average': 42.0}]},
 {'item_category': 'Category1',
  'item_name': 'Michel2',
  'price': [{'Week': 14.0, 'average': 31.55},
   {'Week': 15.0, 'average': 126.3}]},
 {'item_category': 'Category2',
  'item_name': 'Michel2',
  'price': [{'Week': 15.0, 'average': 21.0},
   {'Week': 17.0, 'average': 126.3},
   {'Week': 18.0, 'average': 42.1}]}]
like image 63
BENY Avatar answered Oct 13 '22 01:10

BENY