Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas resampling from months to weeks

Tags:

python

pandas

I am attempting to downsample monthly data to weekly data and have a time series dataframe of months that looks like this:

             qty
PERIOD_NAME 
2017-09-01  49842.0
2017-10-01  27275.0
2017-11-01  29159.0
2017-12-01  51344.0
2018-01-01  19103.0
2018-02-01  23570.0
2018-03-01  45139.0
2018-04-01  25722.0
2018-05-01  22644.0

I've attempted using a resample to weeks like this:

tgt_item_by_445_wk = tgt_item_by_445_wk.resample('W').sum()

which yields:

             qty
PERIOD_NAME 
2017-09-03  49842.0
2017-09-10  0.0
2017-09-17  0.0
2017-09-24  0.0
2017-10-01  27275.0
2017-10-08  0.0
2017-10-15  0.0
2017-10-22  0.0
2017-10-29  0.0

I've tried interpolation, but I can't get what I am looking for, which is a fill of the unsampled (0's) with an even split of the first value like this:

              qty
PERIOD_NAME 
2017-09-03  12460.5
2017-09-10  12460.5
2017-09-17  12460.5
2017-09-24  12460.5
2017-10-01  5455.0
2017-10-08  5455.0
2017-10-15  5455.0
2017-10-22  5455.0
2017-10-29  5455.0

Is there some method using resample, fills and interpolation that allows this?

like image 824
gman123 Avatar asked Jun 16 '18 19:06

gman123


People also ask

How do I resample data in pandas?

Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

What is Panda resampling?

As previously mentioned, resample() is a method of pandas dataframes that can be used to summarize data by date or time. The . sum() method will add up all values for each resampling period (e.g. for each day) to provide a summary output value for that period.

How do you convert daily data to weekly in Python?

Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.


1 Answers

Let's try asfreq and groupby.

v = df.asfreq('W', method='ffill')
v /= v.groupby(v.index.strftime('%Y-%m')).transform('count')

                  qty
PERIOD_NAME          
2017-09-03   12460.50
2017-09-10   12460.50
2017-09-17   12460.50
2017-09-24   12460.50
2017-10-01    5455.00
2017-10-08    5455.00
2017-10-15    5455.00
2017-10-22    5455.00
2017-10-29    5455.00
2017-11-05    7289.75
2017-11-12    7289.75
2017-11-19    7289.75
2017-11-26    7289.75
2017-12-03   10268.80
2017-12-10   10268.80
2017-12-17   10268.80
2017-12-24   10268.80
2017-12-31   10268.80
2018-01-07    4775.75
2018-01-14    4775.75
2018-01-21    4775.75
2018-01-28    4775.75
2018-02-04    5892.50
2018-02-11    5892.50
2018-02-18    5892.50
2018-02-25    5892.50
2018-03-04   11284.75
2018-03-11   11284.75
2018-03-18   11284.75
2018-03-25   11284.75
2018-04-01    5144.40
2018-04-08    5144.40
2018-04-15    5144.40
2018-04-22    5144.40
2018-04-29    5144.40

This works well since your values are always on the first of each month. Alternatively, you may use

v /= v.groupby(v.qty).transform('count').values 

for the second step.

like image 103
cs95 Avatar answered Oct 19 '22 21:10

cs95