Divide total sum equally to higher sampled time periods when upsampling with pandas

Tags:

pandas

I am trying to distribute the total sum of a time period evenly to the components of the higher sampled time period.

What I did:

>>> rng = pandas.PeriodIndex(start='2014-01-01', periods=2, freq='W')
>>> ts = pandas.Series([i+1 for i in range(len(rng))], index=rng)
>>> ts
2013-12-30/2014-01-05    1
2014-01-06/2014-01-12    2
Freq: W-SUN, dtype: float64

>>> ts.resample('D')
2013-12-30     1
2013-12-31   NaN
2014-01-01   NaN
2014-01-02   NaN
2014-01-03   NaN
2014-01-04   NaN
2014-01-05   NaN
2014-01-06     2
2014-01-07   NaN
2014-01-08   NaN
2014-01-09   NaN
2014-01-10   NaN
2014-01-11   NaN
2014-01-12   NaN
Freq: D, dtype: float64

What I actually want is:

>>> ts.resample('D', some_miracle_thing)
2013-12-30     1/7
2013-12-31     1/7
2014-01-01     1/7
2014-01-02     1/7
2014-01-03     1/7
2014-01-04     1/7
2014-01-05     1/7
2014-01-06     2/7
2014-01-07     2/7
2014-01-08     2/7
2014-01-09     2/7
2014-01-10     2/7
2014-01-11     2/7
2014-01-12     2/7
Freq: D, dtype: float64

Is there a way to do it

Specifically – e.g., with a x/7 lambda function?
Generically, so it works independently of the factor 7 (say 24 for hours to days and so on)?

955

asked Aug 08 '14 13:08

1 Answers

I hate this solution, but it works for upsampling when you're unsure of the number of new intervals. Going from week to day is easy, it's always 7 days / week. But I've found the number of intervals based on an upsample is usually unknown - this solution works for that.

The idea is to get the number of post-resample intervals into the initial pre-resampled dataframe, then re-resample and divide your data by the interval count. Side note - this is for a dataframe, not a series.

# Create unique group IDs by simply using the existing index (Assumes an integer, non-duplicated index)
df['group'] = df.index  

# Get the count of intervals for each post-resampled timestamp.
df['count'] = df.set_index('timestamp').resample('15min').ffill()['group'].value_counts()

# Resample all data again and fill so that the count is now included in every row.
df          = df.set_index('timestamp').resample('15min').ffill()

# Apply the division on the entire dataframe and clean up.
df          = df.div(df['count'], axis = 0).reset_index().drop(['group','count'], axis = 1)

I'd wrap this in a function and tuck it away so I never have to look at it again, with something like:

def distribute_upsample(df, index, freq)

Where index would be 'timestamp' and freq would be '15min'

142

answered Oct 16 '22 15:10

elPastor

Related questions
                            
                                Good real-world uses of metaclasses (e.g. in Python)
                            
                                Python, rasing an exception without arguments
                            
                                Is there any reason *not* to cache an object's hash?
                            
                                numpy: inverting an upper triangular matrix
                            
                                Can I force a numpy ndarray to take ownership of its memory?
                            
                                PyQt: Trying to understand graphics scene/view
                            
                                Python: ImportError: No module named _md5
                            
                                Parse only one level of json
                            
                                Python losing control of subprocess?
                            
                                Specifying anchor names in reST
                            
                                How would I run a script file as part of the python setup.py install?
                            
                                AttributeError: 'NoneType' object has no attribute 'endswith'
                            
                                subprocess's Popen closes stdout/stderr filedescriptors used in another thread when Popen errors
                            
                                Decrypt using an RSA public key with PyCrypto
                            
                                Creating a multithreaded server using SocketServer framework in python
                            
                                How do I launch a file in its default program, and then close it when the script finishes?
                            
                                Filling holes in objects that touch the border of an image
                            
                                How to see exceptions in a Flask + gunicorn app?
                            
                                Creating a large dictionary in pyspark
                            
                                Pandas: Concatenate dataframe and keep duplicate indices

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Divide total sum equally to higher sampled time periods when upsampling with pandas

Tags:

python

pandas

Serbitar

People also ask

1 Answers

elPastor

Recent Activity

Donate For Us