Formatting pandas dataseries grouped by two columns and resampled on third with a mean to a dict

Tags:

I have a Data frame like this one:

°  item_name   item_category   scraping_date        price
0   Michel1     Category1       2018-04-14           21.0
1   Michel1     Category1       2018-04-16           42.1
2   Michel1     Category1       2018-04-17           84.0
3   Michel1     Category1       2018-04-19           126.2
4   Michel1     Category1       2018-04-20           168.3
5   Michel1     Category2       2018-04-23           21.2
6   Michel1     Category2       2018-05-08           42.0
7   Michel1     Category2       2018-03-26           84.1
8   Michel1     Category2       2018-03-31           126.2
9   Michel1     Category2       2018-04-01           168.3
10  Michel2     Category1       2018-04-04           21.0
11  Michel2     Category1       2018-04-05           42.1
12  Michel2     Category1       2018-04-09           84.2
13  Michel2     Category1       2018-04-11           126.3
14  Michel2     Category1       2018-04-12           168.4
15  Michel2     Category2       2018-04-13           21.0
16  Michel2     Category2       2018-05-03           42.1
17  Michel2     Category2       2018-04-25           84.2
18  Michel2     Category2       2018-04-28           126.3
19  Michel2     Category2       2018-04-29           168.4

I would like to group by Item name and category, resample by week and have the average of price per week. Finally, I would like to output the date in a dict like this:

[
  {
    "item_name": "Michel1",
    "item_category": "Category1", 
    "prices": [
                {"week": "1", "average": "84.2"},
                {"week": "2", "average": "84.2"}
              ]
  },
  {
    "item_name": "Michel1",
    "item_category": "Category2", 
    "prices": [
                {"week": "1", "average": "84.2"},
                {"week": "2", "average": "84.2"}
              ]
  },....
]

I came with something to group by and have the average but I can not transform it into a dict:

df["price"] = df["price"].astype(float)
df["scraping_date"] = pd.to_datetime(df["scraping_date"])
df.set_index("scraping_date").groupby(["item_name","item_category"])["price"].resample("W").mean()

If I do a .to_dict(), I get this, which is nearly not what I want:

{('Michel1', 'Category1', Timestamp('2017-12-03 00:00:00')): 20.0,
 ('Michel1', 'Category1', Timestamp('2017-12-10 00:00:00')): 20.0,
 ('Michel1', 'Category2', Timestamp('2017-12-17 00:00:00')): 20.0,
 ('Michel1', 'Category2', Timestamp('2017-12-24 00:00:00')): 20.0,
 ('Michel2', 'Category1', Timestamp('2017-12-31 00:00:00')): 20.0,
 ('Michel2', 'Category1', Timestamp('2018-01-07 00:00:00')): 20.0,
}

592

asked Jun 14 '18 13:06

a-coruble

1 Answers

I can not guarantee the speed , By using group by with apply

df['Week']=pd.to_datetime(df.scraping_date).dt.week
df.groupby(['item_name','item_category']).apply(lambda x : x.groupby(['Week']).price.mean().to_frame('average')
.reset_index().to_dict('r')).to_frame('price').reset_index().to_dict('r')
Out[51]: 
[{'item_category': 'Category1',
  'item_name': 'Michel1',
  'price': [{'Week': 15.0, 'average': 21.0},
   {'Week': 16.0, 'average': 105.15}]},
 {'item_category': 'Category2',
  'item_name': 'Michel1',
  'price': [{'Week': 13.0, 'average': 126.2},
   {'Week': 17.0, 'average': 21.2},
   {'Week': 19.0, 'average': 42.0}]},
 {'item_category': 'Category1',
  'item_name': 'Michel2',
  'price': [{'Week': 14.0, 'average': 31.55},
   {'Week': 15.0, 'average': 126.3}]},
 {'item_category': 'Category2',
  'item_name': 'Michel2',
  'price': [{'Week': 15.0, 'average': 21.0},
   {'Week': 17.0, 'average': 126.3},
   {'Week': 18.0, 'average': 42.1}]}]

answered Oct 13 '22 01:10

BENY

Related questions
                            
                                Can python add, run & remove a VBA macro to excel without intermediate saving steps?
                            
                                Is there a way to read a multi-line csv file in Apache Beam using the ReadFromText transform (Python)?
                            
                                Recursively calculating ratios between parents and children in pandas dataframe
                            
                                Create a function from a modular template in python 3.6+ with readable and debuggable code
                            
                                Efficient matrix operations in cython with no python objects
                            
                                numpy summing matrix-rows by indices
                            
                                Pandas difference between first and last grouped by consecutive events
                            
                                Approach to Create Floating Box UI in QT
                            
                                fastest way to load images in python for processing
                            
                                How to work around Python Pandas DataFrame's "Out of bounds nanosecond timestamp" error?
                            
                                Creating a neural network in keras to multiply two input integers
                            
                                Transforming Python list to networkx graph
                            
                                How do I equalize an image and plot it to an histogram with openCV and numpy
                            
                                Best practice to simulate exception for no space left on disk in Python with OpenCV
                            
                                How to update only 1 attribute of json type column in postgres table using python api code
                            
                                Python HTTP debuger
                            
                                Django inheritance and polymorphism with proxy models
                            
                                Numerical calculation of curvature
                            
                                Python Observing `str` calls by through sys.setprofile and frame inspection
                            
                                Exporting BigQuery Table Data to Google Cloud Storage having where clause using python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Formatting pandas dataseries grouped by two columns and resampled on third with a mean to a dict

Tags:

python

json

datetime

pandas

a-coruble

People also ask

1 Answers

BENY

Recent Activity

Donate For Us