Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group by time and other column in pandas

Tags:

pandas

I have a large pandas dataframe containing columns timestamp, name, and value

index    timestamp                     name   value
0        1999-12-31 23:59:59.000107    A      16
1        1999-12-31 23:59:59.000385    B      12
2        1999-12-31 23:59:59.000404    C      25 
3        1999-12-31 23:59:59.000704    B      15
4        1999-12-31 23:59:59.001281    A      300
5        1999-12-31 23:59:59.002211    C      20
6        1999-12-31 23:59:59.002367    C      3

I want to group by time buckets (say 20ms or 20 minutes) and name, and calculate the average value for each group.

What is the most efficient manner to do it?

like image 941
volatile Avatar asked Mar 09 '16 17:03

volatile


People also ask

Can you use Groupby with multiple columns in pandas?

Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.

How do I Group column values in pandas?

groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to . groupby() as the first argument.

How do I work with dates and times in pandas?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.


1 Answers

You can use pd.Grouper, but it requires you to have the timestamps on the index. So you could try something like:

df.set_index('timestamp').groupby([pd.Grouper(freq='20Min'), 'name']).mean()
like image 187
Gustavo Bezerra Avatar answered Sep 29 '22 16:09

Gustavo Bezerra