Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the average of previous sales at each time in python

Tags:

python

pandas

I have a csv file with four columns: date, wholesaler, product, and sales. I am looking for finding average of previous sales for each Product and Wholesaler combination at each date. It means what is the average previous sales of product 'A' at wholesaler 'B' at time 'C'.

For instance we know sales of product 'A' at wholesaler 'B' at Jan, Apr, May, Aug that are 100, 200, 300, 400 respectively. Let assume we do not have any record before Jan. So the average of previous sale of product 'A' in wholesaler 'B' at Apr is equal to 100/1, and at May is equal to (200+100)/2 and at Aug is (300+200+100)/3.

The following table shows my data:

date    wholesaler product  sales
12/31/2012  53929  UPE54     4
12/31/2012  13131  UPE55     1
2/23/2013   13131  UPE55  1156
4/24/2013   13131  UPE55     1
12/1/2013   83389  UPE54     9
12/17/2013  83389  UPE54     1
12/18/2013  52237  UPE54     9
12/19/2013  53929  UME24     1
12/31/2013  82204  UPE55     9
12/31/2013  11209  UME24     4
12/31/2013  52237  UPE54     1

Now I am using this code:

df = pd.read_csv('Sample.csv',index_col='date')
df2 = df.groupby(['wholesaler','product'])['sales'].mean()

That gives an average sales for each wholesaler-product while I am looking for average of previous sales at each date.

wholesaler product   avg sales
    11209  UME24      4.00
    13131  UPE55    713.00
    22423  UME24      1.00
    24302  U4E16    121.00 

Thank you for your help!

like image 414
Ashkan Mirzaee Avatar asked Feb 07 '23 03:02

Ashkan Mirzaee


1 Answers

It was quite tricky for me but works anyhow. Expecting more elegant solution from others.

import pandas as pd
import datetime

dateparse = lambda x: pd.datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv('Sample.csv',index_col='date', parse_dates=[0], date_parser=dateparse)

expd_gb = df.reset_index().groupby(['wholesaler', 'product'])['sales'].apply(pd.Series.expanding)
idx = df.reset_index().groupby(['wholesaler', 'product', 'date'])['sales'].count().index

cnct = pd.concat([expd_gb.iloc[n].mean().shift(1) for n in range(len(expd_gb))])
cnct.index = idx

cnct.to_csv('TotalAvg.csv')

Result,

wholesaler  product  date      
11209       UME24    2013-12-31     NaN
13131       UPE55    2012-12-31     NaN
                     2013-02-23     1.0
                     2013-04-24     578.5
52237       UPE54    2013-12-18     NaN
                     2013-12-31     9.0
53929       UME24    2013-12-19     NaN
            UPE54    2012-12-31     NaN
82204       UPE55    2012-12-31     NaN
83389       UPE54    2013-12-01     NaN
                     2013-12-17     9.0
like image 135
su79eu7k Avatar answered Feb 22 '23 23:02

su79eu7k