I am trying to distribute the total sum of a time period evenly to the components of the higher sampled time period.
What I did:
>>> rng = pandas.PeriodIndex(start='2014-01-01', periods=2, freq='W')
>>> ts = pandas.Series([i+1 for i in range(len(rng))], index=rng)
>>> ts
2013-12-30/2014-01-05 1
2014-01-06/2014-01-12 2
Freq: W-SUN, dtype: float64
>>> ts.resample('D')
2013-12-30 1
2013-12-31 NaN
2014-01-01 NaN
2014-01-02 NaN
2014-01-03 NaN
2014-01-04 NaN
2014-01-05 NaN
2014-01-06 2
2014-01-07 NaN
2014-01-08 NaN
2014-01-09 NaN
2014-01-10 NaN
2014-01-11 NaN
2014-01-12 NaN
Freq: D, dtype: float64
What I actually want is:
>>> ts.resample('D', some_miracle_thing)
2013-12-30 1/7
2013-12-31 1/7
2014-01-01 1/7
2014-01-02 1/7
2014-01-03 1/7
2014-01-04 1/7
2014-01-05 1/7
2014-01-06 2/7
2014-01-07 2/7
2014-01-08 2/7
2014-01-09 2/7
2014-01-10 2/7
2014-01-11 2/7
2014-01-12 2/7
Freq: D, dtype: float64
Is there a way to do it
x/7
lambda function?First ensure that your dataframe has an index of type DateTimeIndex . Then use the resample function to either upsample (higher frequency) or downsample (lower frequency) your dataframe. Then apply an aggregator (e.g. sum ) to aggregate the values across the new sampling frequency.
To resample time series data means to summarize or aggregate the data by a new time period.
Resample time-series data. Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index ( DatetimeIndex , PeriodIndex , or TimedeltaIndex ), or the caller must pass the label of a datetime-like series/index to the on / level keyword parameter.
I hate this solution, but it works for upsampling when you're unsure of the number of new intervals. Going from week to day is easy, it's always 7 days / week. But I've found the number of intervals based on an upsample is usually unknown - this solution works for that.
The idea is to get the number of post-resample intervals into the initial pre-resampled dataframe, then re-resample and divide your data by the interval count. Side note - this is for a dataframe, not a series.
# Create unique group IDs by simply using the existing index (Assumes an integer, non-duplicated index)
df['group'] = df.index
# Get the count of intervals for each post-resampled timestamp.
df['count'] = df.set_index('timestamp').resample('15min').ffill()['group'].value_counts()
# Resample all data again and fill so that the count is now included in every row.
df = df.set_index('timestamp').resample('15min').ffill()
# Apply the division on the entire dataframe and clean up.
df = df.div(df['count'], axis = 0).reset_index().drop(['group','count'], axis = 1)
I'd wrap this in a function and tuck it away so I never have to look at it again, with something like:
def distribute_upsample(df, index, freq)
Where index
would be 'timestamp'
and freq
would be '15min'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With