Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to groupby time series by 10 minutes using pandas

Have a time series(ts) indexed by DatatimeIndex, want to group it by 10 minutes

index   x  y  z

ts1     ....
ts2     ....
...

I know how to group by 1 minute

def group_by_minute(timestamp):
    year = timestamp.year
    month = timestamp.month
    day = timestamp.day
    hour = timestamp.hour
    minute = timestamp.minute
    return datetime.datetime(year, month, day, hour, minute)

then

ts.groupby(group_by_minute, axis=0)

my customized function (roughly)

def my_function(group):
    first_latitude = group['latitude'].sort_index().head(1).values[0]
    last_longitude = group['longitude'].sort_index().tail(1).values[0]
    return first_latitude - last_longitude

so the ts DataFrame should definitely contains 'latitude' and 'longitude' columns

When using TimeGrouper

   ts.groupby(pd.TimeGrouper(freq='100min')).apply(my_function)

I got the following errors,

TypeError: cannot concatenate a non-NDFrame object
like image 607
Hello lad Avatar asked Aug 21 '15 18:08

Hello lad


People also ask

What is possible using GroupBy () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

Is pandas good for time series?

Dates and Times in Python The Python world has a number of available representations of dates, times, deltas, and timespans. While the time series tools provided by Pandas tend to be the most useful for data science applications, it is helpful to see their relationship to other packages used in Python.

Is GroupBy faster on index pandas?

Although Groupby is much faster than Pandas GroupBy. apply and GroupBy. transform with user-defined functions, Pandas is much faster with common functions like mean and sum because they are implemented in Cython.


1 Answers

There is a pandas.TimeGrouper for this sort of thing, what you described would be some thing like:

agg_10m = df.groupby(pd.TimeGrouper(freq='10Min')).aggregate(numpy.sum) #or other function
like image 76
CT Zhu Avatar answered Oct 13 '22 23:10

CT Zhu