I have a data set which is aggregated between two dates and I want to de-aggregate it daily by dividing total number with days between these dates. As a sample
StoreID Date_Start Date_End Total_Number_of_sales
78 12/04/2015 17/05/2015 79089
80 12/04/2015 17/05/2015 79089
The data set I want is:
StoreID Date Number_Sales
78 12/04/2015 79089/38(as there are 38 days in between)
78 13/04/2015 79089/38(as there are 38 days in between)
78 14/04/2015 79089/38(as there are 38 days in between)
78 ...
78 17/05/2015 79089/38(as there are 38 days in between)
Any help would be useful. Thanks
I'm not sure if this is exactly what you want but you can try this (I've added another imaginary row):
import datetime as dt
df = pd.DataFrame({'date_start':['12/04/2015','17/05/2015'],
'date_end':['18/05/2015','10/06/2015'],
'sales':[79089, 1000]})
df['date_start'] = pd.to_datetime(df['date_start'], format='%d/%m/%Y')
df['date_end'] = pd.to_datetime(df['date_end'], format='%d/%m/%Y')
df['days_diff'] = (df['date_end'] - df['date_start']).dt.days
master_df = pd.DataFrame(None)
for row in df.index:
new_df = pd.DataFrame(index=pd.date_range(start=df['date_start'].iloc[row],
end = df['date_end'].iloc[row],
freq='d'))
new_df['number_sales'] = df['sales'].iloc[row] / df['days_diff'].iloc[row]
master_df = pd.concat([master_df, new_df], axis=0)
First convert string dates to datetime objects (so you can calculate number of days in between ranges), then create a new index based on the date range, and divide sales. The loop sticks each row of your dataframe into an "expanded" dataframe and then concatenates them into one master dataframe.
What about creating a new dataframe?
start = pd.to_datetime(df['Date_Start'].values[0], dayfirst=True)
end = pd.to_datetime(df['Date_End'].values[0], dayfirst=True)
idx = pd.DatetimeIndex(start=start, end=end, freq='D')
res = pd.DataFrame(df['Total_Number_of_sales'].values[0]/len(idx), index=idx, columns=['Number_Sales'])
yields
In[42]: res.head(5)
Out[42]:
Number_Sales
2015-04-12 2196.916667
2015-04-13 2196.916667
2015-04-14 2196.916667
2015-04-15 2196.916667
2015-04-16 2196.916667
If you have multiple stores (according to your comment and edit), then you could loop over all rows, calculate sales and concatenate the resulting dataframes afterwards.
df = pd.DataFrame({'Store_ID': [78, 78, 80],
'Date_Start': ['12/04/2015', '18/05/2015', '21/06/2015'],
'Date_End': ['17/05/2015', '10/06/2015', '01/07/2015'],
'Total_Number_of_sales': [79089., 50000., 25000.]})
to_concat = []
for _, row in df.iterrows():
start = pd.to_datetime(row['Date_Start'], dayfirst=True)
end = pd.to_datetime(row['Date_End'], dayfirst=True)
idx = pd.DatetimeIndex(start=start, end=end, freq='D')
sales = [row['Total_Number_of_sales']/len(idx)] * len(idx)
id = [row['Store_ID']] * len(idx)
res = pd.DataFrame({'Store_ID': id, 'Number_Sales':sales}, index=idx)
to_concat.append(res)
res = pd.concat(to_concat)
There are definitley more elegant solutions, have a look for example at this thread.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With