I have a large pandas dataframe containing columns timestamp, name, and value
index timestamp name value
0 1999-12-31 23:59:59.000107 A 16
1 1999-12-31 23:59:59.000385 B 12
2 1999-12-31 23:59:59.000404 C 25
3 1999-12-31 23:59:59.000704 B 15
4 1999-12-31 23:59:59.001281 A 300
5 1999-12-31 23:59:59.002211 C 20
6 1999-12-31 23:59:59.002367 C 3
I want to group by time buckets (say 20ms or 20 minutes) and name, and calculate the average value for each group.
What is the most efficient manner to do it?
Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.
groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to . groupby() as the first argument.
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
You can use pd.Grouper
, but it requires you to have the timestamps on the index. So you could try something like:
df.set_index('timestamp').groupby([pd.Grouper(freq='20Min'), 'name']).mean()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With