Pandas resample data frame with fixed number of rows

Tags:

pandas

With pandas.DataFrame.resample I can downsample a DataFrame into a certain time duration:

df.resample("3s").mean()

However, I do not want to specify a certain time, but rather a fixed number of rows in the original data frame, e.g. "resample such that three rows previously are now aggregated into one". How's that possible in pandas?

857

asked Jun 01 '17 11:06

knub

1 Answers

It might be a bit late, but here is my answer for everyone searching for a solution to this problem.

One solution would be to use pandas rolling(n) sliding window functionality and then select every nth value. e.G. for n=3

df_sub = df.rolling(3).mean()[::3]

this is a bit wasteful for calculation, since you recalculate the whole dataframe and then just keep 1/n percent of it.

Another similar approach to the problem, wich is not calculating the mean, but instead interpolating the whole dataframe column wise would be to use numpy's interp1 function.

e.G.: Assuming you have a DataFrame, where the indices are are monotonically increasing numerical/timestamped values (as usually with time series data), and you want to adjust every column individually you could do it like this:

def resample_fixed(df, n_new):
    n_old, m = df.values.shape
    mat_old = df.values
    mat_new = np.zeros((n_new, m))
    x_old = np.linspace(df.index.min(), df.index.max(), n_old)
    x_new = np.linspace(df.index.min(), df.index.max(), n_new)

    for j in range(m):
        y_old = mat_old[:, j]
        y_new = np.interp(x_new, x_old, y_old)
        mat_new[:, j] = y_new

    return pd.DataFrame(mat_new, index=x_new, columns=df.columns)

be careful though, interp1 does alter your data, since it linearly interpolates your datapoints. I would recommend inspecting the result after interpolation.

You can find a full example on the interpolation in a gist file I did for that here.

114

answered Sep 30 '22 15:09

Tobi

Related questions
                            
                                Nonblocking Scrapy pipeline to database
                            
                                Solve ODE in Python with a time-delay
                            
                                Typing, custom collection type
                            
                                Store most informative features from NLTK NaiveBayesClassifier in a list
                            
                                Web Scraping with Selenium Python [Twitter + Instagram]
                            
                                Chain a celery task's results into a distributed group
                            
                                How to plot (inline) with rpy2 in Jupyter notebook?
                            
                                Range of size of tensor's dimension - tf.range
                            
                                Graphene mutation not mapping Models in SQL Alchemy
                            
                                Interpolating a 3d array in Python expanded
                            
                                Search for dictionary key when the keys are tuples
                            
                                pytest-xdist: LookupError: setuptools-scm was unable to detect version
                            
                                Sympy: Modifying LaTeX output of derivatives
                            
                                How can I use Python NLTK to identify collocations among single characters?
                            
                                super not working with class decorators?
                            
                                Sending / receiving WebSocket message over Python socket / WebSocket Client
                            
                                Pandas - Getting a Key Error when the Key Exists
                            
                                Suddenly getting 403 (Forbidden) with Oauth 2.0 consent on YouTube V3 API
                            
                                Limit neural network output to subset of trained classes
                            
                                How to save (store) PDF file generated by weasyprint to specified directory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With