I'd like to resample a pandas object using a specific date (or month) as the edge of the first bin. For instance, in the following snippet I'd like my first index value to be <code>2020-02-29</code> and I'd be happy specifying <code>start=2</code> or <code>start="2020-02-29"</code>. <pre class="prettyprint lang-py prettyprint-override"><code>>>> dates = pd.date_range("2020-01-29", "2021-07-04") >>> s = pd.Series(range(len(dates)), index=dates) >>> s.resample('4M').count() 2020-01-31 3 2020-05-31 121 2020-09-30 122 2021-01-31 123 2021-05-31 120 2021-09-30 34 Freq: 4M, dtype: int64 </code></pre> So far this is the cleanest I can come up with uses <code>pd.cut</code> and <code>groupby</code>: <pre class="prettyprint lang-py prettyprint-override"><code>>>> rule = "4M" >>> start = pd.Timestamp("2020-02-29") - pd.tseries.frequencies.to_offset(rule) >>> end = s.index.max() + pd.tseries.frequencies.to_offset(rule) >>> bins = pd.date_range(start, end, freq=rule) >>> gb = s.groupby(pd.cut(s.index, bins)).count() >>> gb.index = gb.index.categories.right >>> gb 2020-02-29 32 2020-06-30 122 2020-10-31 123 2021-02-28 120 2021-06-30 122 2021-10-31 4 dtype: int64 </code></pre>

My answer feels a little hacky, but uses <code>resample</code> and gives the desired output. Find the date one bin length (e.g. 4 months, or month ends specifically) before the specified date, append it to <code>s</code>, and then <code>resample</code>: <pre class="prettyprint"><code>rule = '4M' date = '02-29-2020' base_date = pd.to_datetime(date) - pd.tseries.frequencies.to_offset(rule) s.loc[base_date] = np.nan output = s.resample(rule=rule).count() output=output[output.index >= date] </code></pre> Result: <pre class="prettyprint"><code>2020-02-29 32 2020-06-30 122 2020-10-31 123 2021-02-28 120 2021-06-30 122 2021-10-31 4 Freq: 4M, dtype: int64 </code></pre> I added <code>output=output[output.index >= date]</code> b/c otherwise you get an additional empty bin: <pre class="prettyprint"><code>2019-10-31 0 2020-02-29 32 2020-06-30 122 2020-10-31 123 2021-02-28 120 2021-06-30 122 2021-10-31 4 Freq: 4M, dtype: int64 </code></pre>

All you need to use is <code>pd.cut</code> like below: <pre class="prettyprint"><code>>>> gb = pd.cut(s.index, bins).value_counts() >>> gb.index = gb.index.categories.right >>> gb 2020-02-29 32 2020-06-30 122 2020-10-31 123 2021-02-28 120 2021-06-30 122 2021-10-31 4 dtype: int64 </code></pre> there is no need to use <code>groupby</code>

Pandas resample with start date

I'd like to resample a pandas object using a specific date (or month) as the edge of the first bin. For instance, in the following snippet I'd like my first index value to be 2020-02-29 and I'd be happy specifying start=2 or start="2020-02-29".

>>> dates = pd.date_range("2020-01-29", "2021-07-04")
>>> s = pd.Series(range(len(dates)), index=dates)
>>> s.resample('4M').count()
2020-01-31      3
2020-05-31    121
2020-09-30    122
2021-01-31    123
2021-05-31    120
2021-09-30     34
Freq: 4M, dtype: int64

So far this is the cleanest I can come up with uses pd.cut and groupby:

>>> rule = "4M"
>>> start = pd.Timestamp("2020-02-29") - pd.tseries.frequencies.to_offset(rule)
>>> end = s.index.max() + pd.tseries.frequencies.to_offset(rule)
>>> bins = pd.date_range(start, end, freq=rule)
>>> gb = s.groupby(pd.cut(s.index, bins)).count()
>>> gb.index = gb.index.categories.right
>>> gb
2020-02-29     32
2020-06-30    122
2020-10-31    123
2021-02-28    120
2021-06-30    122
2021-10-31      4
dtype: int64

How do I resample data in Pandas?

Resample Pandas time-series data. The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

How do you resample a dataset in Python?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .

My answer feels a little hacky, but uses resample and gives the desired output. Find the date one bin length (e.g. 4 months, or month ends specifically) before the specified date, append it to s, and then resample:

rule = '4M'
date = '02-29-2020'

base_date = pd.to_datetime(date) - pd.tseries.frequencies.to_offset(rule)
s.loc[base_date] = np.nan
output = s.resample(rule=rule).count()
output=output[output.index >= date]

Result:

2020-02-29     32
2020-06-30    122
2020-10-31    123
2021-02-28    120
2021-06-30    122
2021-10-31      4
Freq: 4M, dtype: int64

I added output=output[output.index >= date] b/c otherwise you get an additional empty bin:

2019-10-31      0
2020-02-29     32
2020-06-30    122
2020-10-31    123
2021-02-28    120
2021-06-30    122
2021-10-31      4
Freq: 4M, dtype: int64

All you need to use is pd.cut like below:

>>> gb = pd.cut(s.index, bins).value_counts()
>>> gb.index = gb.index.categories.right
>>> gb
2020-02-29     32
2020-06-30    122
2020-10-31    123
2021-02-28    120
2021-06-30    122
2021-10-31      4
dtype: int64

there is no need to use groupby

Pandas resample with start date

Tags:

jsignell

People also ask

2 Answers

Tom

Mohamad Al Mdfaa

Recent Activity

Donate For Us

Pandas resample with start date

Tags:

jsignell

People also ask

2 Answers

Tom

Mohamad Al Mdfaa

Related questions

Recent Activity

Donate For Us