Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas' equivalent of resample for integer index

I'm looking for a pandas equivalent of the resample method for a dataframe whose isn't a DatetimeIndex but an array of integers, or maybe even floats.

I know that for some cases (this one, for example) the resample method can be substituted easily by a reindex and interpolation, but for some cases (I think) it can't.

For example, if I have

df = pd.DataFrame(np.random.randn(10,2))
withdates = df.set_index(pd.date_range('2012-01-01', periods=10))
withdates.resample('5D', np.std)

this gives me

                   0         1
2012-01-01  1.184582  0.492113
2012-01-06  0.533134  0.982562

but I can't produce the same result with df and resample. So I'm looking for something that would work as

 df.resample(5, np.std)

and that would give me

          0         1
0  1.184582  0.492113
5  0.533134  0.982562

Does such a method exist? The only way I was able to create this method was by manually separating df into smaller dataframes, applying np.std and then concatenating everything back, which I find pretty slow and not smart at all.

Cheers

like image 613
TomCho Avatar asked May 23 '16 16:05

TomCho


People also ask

What is the difference between resample and Asfreq?

resample is more general than asfreq . For example, using resample I can pass an arbitrary function to perform binning over a Series or DataFrame object in bins of arbitrary size. asfreq is a concise way of changing the frequency of a DatetimeIndex object. It also provides padding functionality.

How do I resample data in pandas?

Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

What does Reset_index do in pandas?

Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.

What is Panda resampling?

As previously mentioned, resample() is a method of pandas dataframes that can be used to summarize data by date or time. The . sum() method will add up all values for each resampling period (e.g. for each day) to provide a summary output value for that period.


1 Answers

Setup

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(20, 2), columns=['A', 'B'])

You need to create the labels to group by yourself. I'd use:

(df.index.to_series() / 5).astype(int)

To get you a series of values like [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ...] Then use this in a groupby

You'll also need to specify the index for the new dataframe. I'd use:

df.index[4::5]

To get a the current index starting at the 5th position (hence the 4) and every 5th position after that. It will look like [4, 9, 14, 19]. I could've done this as df.index[::5] to get the starting positions but I went with ending positions.

Solution

# assign as variable because I'm going to use it more than once.
s = (df.index.to_series() / 5).astype(int)

df.groupby(s).std().set_index(s.index[4::5])

Looks like:

           A         B
4   0.198019  0.320451
9   0.329750  0.408232
14  0.293297  0.223991
19  0.095633  0.376390

Other considerations

This is for the equivalent of down sampling. We haven't addressed up sampling.

To go back from what we've produced to a dataframe index by something more frequent, we can use reindex like so:

# assign what we've done above to df_down
df_down = df.groupby(s).std().set_index(s.index[4::5])

df_up = df_down.reindex(range(20)).bfill()

Looks like:

           A         B
0   0.198019  0.320451
1   0.198019  0.320451
2   0.198019  0.320451
3   0.198019  0.320451
4   0.198019  0.320451
5   0.329750  0.408232
6   0.329750  0.408232
7   0.329750  0.408232
8   0.329750  0.408232
9   0.329750  0.408232
10  0.293297  0.223991
11  0.293297  0.223991
12  0.293297  0.223991
13  0.293297  0.223991
14  0.293297  0.223991
15  0.095633  0.376390
16  0.095633  0.376390
17  0.095633  0.376390
18  0.095633  0.376390
19  0.095633  0.376390

We could also use other things to reindex by like range(0, 20, 2) to up sample to even integer indices.

like image 106
piRSquared Avatar answered Sep 17 '22 21:09

piRSquared