Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas resampling without performing statistics

I have a five minute dataframe:

rng = pd.date_range('1/1/2011', periods=60, freq='5Min')
df = pd.DataFrame(np.random.randn(60, 4), index=rng, columns=['A', 'B', 'C', 'D'])

                            A         B         C         D
2011-01-01 00:00:00  1.287045 -0.621473  0.482130  1.886648
2011-01-01 00:05:00  0.402645 -1.335942 -0.609894 -0.589782
2011-01-01 00:10:00 -0.311789  0.342995 -0.875089 -0.781499
2011-01-01 00:15:00  1.970683  0.471876  1.042425 -0.128274
2011-01-01 00:20:00 -1.900357 -0.718225 -3.168920 -0.355735
2011-01-01 00:25:00  1.128843 -0.097980  1.130860 -1.045019
2011-01-01 00:30:00 -0.261523  0.379652 -0.385604 -0.910902

I would like to resample only the data on the 15 minute interval, but without aggregating into a statistic (I dont want the mean,median,stdev).I want to subsample and get the actual data on the 15 minute interval.Is there a builtin method to do this?

My output would be:

                            A         B         C         D                 
2011-01-01 00:00:00  1.287045 -0.621473  0.482130  1.886648                 
2011-01-01 00:15:00  1.970683  0.471876  1.042425 -0.128274                 
2011-01-01 00:30:00 -0.261523  0.379652 -0.385604 -0.910902                 
like image 305
John Saraceno Avatar asked Feb 09 '16 22:02

John Saraceno


People also ask

How do I resample data in pandas?

Resample Pandas time-series data. The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

How do you resample hourly data in Python?

Resample Hourly Data to Daily Data To simplify your plot which has a lot of data points due to the hourly records, you can aggregate the data for each day using the . resample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them.

What does PD resample do?

Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics. There are many other types of time series frequency available.

What is resampling time series data?

Resample time-series data. Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index ( DatetimeIndex , PeriodIndex , or TimedeltaIndex ), or the caller must pass the label of a datetime-like series/index to the on / level keyword parameter.


2 Answers

You can resample to 15 min and take the 'first' of each group:

In [40]: df.resample('15min').first()
Out[40]:
                            A         B         C         D
2011-01-01 00:00:00 -0.415637 -1.345454  1.151189 -0.834548
2011-01-01 00:15:00  0.221777 -0.866306  0.932487 -1.243176
2011-01-01 00:30:00 -0.690039  0.778672 -0.527087 -0.156369
...

Another way to do this is constructing the new desired index and do a reindex (this is a bit more work in this case, but in the case of a irregular time series this ensures it takes the data at exactly each 15min):

In [42]: new_rng = pd.date_range('1/1/2011', periods=20, freq='15min')

In [43]: df.reindex(new_rng)
Out[43]:
                            A         B         C         D
2011-01-01 00:00:00 -0.415637 -1.345454  1.151189 -0.834548
2011-01-01 00:15:00  0.221777 -0.866306  0.932487 -1.243176
2011-01-01 00:30:00 -0.690039  0.778672 -0.527087 -0.156369
...
like image 175
joris Avatar answered Nov 06 '22 00:11

joris


Function asfreq() doesn't do any aggregation:

df.asfreq('15min')
like image 43
Borja Avatar answered Nov 05 '22 23:11

Borja