How to bin time in a pandas dataframe

Tags:

I am trying to analyze average daily fluctuations in a measurement "X" over several weeks using pandas dataframes, however timestamps/datetimes etc. are proving particularly hellish to deal with. Having spent a good few hours trying to work this out my code is getting messier and messier and I don't think I'm any closer to a solution, hoping someone here can guide me in the right direction.

I have measured X at different times and on different days, saving the daily results to a dataframe which has the form:

    Timestamp(datetime64)         X 

0    2015-10-05 00:01:38          1
1    2015-10-05 06:03:39          4 
2    2015-10-05 13:42:39          3
3    2015-10-05 22:15:39          2

As the time the measurement is made at changes from day to day I decided to use binning to organize the data, and then work out averages and STD for each bin which I can then plot. My idea was to create a final dataframe with bins and the average value of X for the measurements, the 'Observations' column is just to aid understanding:

        Time Bin       Observations     <X>  

0     00:00-05:59      [ 1 , ...]       2.3
1     06:00-11:59      [ 4 , ...]       4.6
2     12:00-17:59      [ 3 , ...]       8.5
3     18:00-23:59      [ 2 , ...]       3.1

However I've run into difficulties with incompatibility between time, datetime, datetime64, timedelta and binning using pd.cut and pd.groupby, basically I feel like I'm making stabs in the dark with no idea as to the the 'right' way to approach this problem. The only solution I can think of is a row-by-row iteration through the dataframe but I'd really like to avoid having to do this.

911

asked Oct 15 '15 14:10

Josh

1 Answers

The correct way to bin a pandas.DataFrame is to use pandas.cut
Verify the date column is in a datetime format with pandas.to_datetime.
Use .dt.hour to extract the hour, for use in the .cut method.
Tested in python 3.8.11 and pandas 1.3.1

How to `bin` the data

import pandas as pd
import numpy as np  # for test data
import random  # for test data

# setup a sample dataframe; creates 1.5 months of hourly observations
np.random.seed(365)
random.seed(365)
data = {'date': pd.bdate_range('2020-09-21', freq='h', periods=1100).tolist(),
        'x': np.random.randint(10, size=(1100))}
df = pd.DataFrame(data)

# the date column of the sample data is already in a datetime format
# if the date column is not a datetime, then uncomment the following line
# df.date= pd.to_datetime(df.date)

# define the bins
bins = [0, 6, 12, 18, 24]

# add custom labels if desired
labels = ['00:00-05:59', '06:00-11:59', '12:00-17:59', '18:00-23:59']

# add the bins to the dataframe
df['Time Bin'] = pd.cut(df.date.dt.hour, bins, labels=labels, right=False)

# display(df.head())
                  date  x     Time Bin
0  2020-09-21 00:00:00  2  00:00-05:59
1  2020-09-21 01:00:00  4  00:00-05:59
2  2020-09-21 02:00:00  1  00:00-05:59
3  2020-09-21 03:00:00  5  00:00-05:59
4  2020-09-21 04:00:00  2  00:00-05:59

# display(df.tail())
                    date  x     Time Bin
1095 2020-11-05 15:00:00  2  12:00-17:59
1096 2020-11-05 16:00:00  3  12:00-17:59
1097 2020-11-05 17:00:00  1  12:00-17:59
1098 2020-11-05 18:00:00  2  18:00-23:59
1099 2020-11-05 19:00:00  2  18:00-23:59

Groupby `'Time Bin'`

Use pandas.DataFrame.groupby on 'Time Bin', and then aggregate 'x' into a list and mean.

# groupby Time Bin and aggregate a list for the observations, and mean
dfg = df.groupby('Time Bin', as_index=False)['x'].agg([list, 'mean'])

# change the column names, if desired
dfg.columns = ['X Observations', 'X mean']

# display(dfg)
                      X Observations    X mean
Time Bin                                 
00:00-05:59  [2, 4, 1, 5, 2, 2, ...]  4.416667
06:00-11:59  [9, 8, 4, 0, 3, 3, ...]  4.760870
12:00-17:59  [7, 7, 7, 0, 8, 4, ...]  4.384058
18:00-23:59  [3, 2, 6, 2, 6, 8, ...]  4.459559

answered Oct 07 '22 05:10

Trenton McKinney

Related questions
                            
                                Face pattern for boxes in boxplots
                            
                                How to run a function periodically with Flask and Celery?
                            
                                Python 3.4.3 subprocess.Popen get output of command without piping?
                            
                                pip doesn't see setuptools
                            
                                How to point LLVM_CONFIG environment variable to the path for llvm-config
                            
                                Store Numpy array index in variable
                            
                                Draw a curve connecting two points instead of a straight line
                            
                                Building Dynamic HTML Email Content with Python
                            
                                Connecting to MySQL database via SSH
                            
                                How can I strip namespaces out of an lxml tree?
                            
                                Can't use a string pattern on a bytes-like object - python's re error
                            
                                Split a list into chunks determined by a separator
                            
                                google cloud sdk: set environment variable_ python --> linux
                            
                                Enter to raw_input automatically
                            
                                Loop over rows of csv.DictReader more than once
                            
                                Merging dataframes based on date range
                            
                                Numpy: add a vector to matrix column wise
                            
                                Python - Random Forest - Iteratively adding trees
                            
                                Skipping more than one row in Python csv
                            
                                Training logistic regression using scikit learn for multi-class classification

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to bin time in a pandas dataframe

Tags:

python

datetime

pandas

pandas-groupby

Josh

People also ask

1 Answers

How to `bin` the data

Groupby `'Time Bin'`

Trenton McKinney

Recent Activity

Donate For Us

How to bin time in a pandas dataframe

Tags:

python

datetime

pandas

pandas-groupby

Josh

People also ask

1 Answers

How to bin the data

Groupby 'Time Bin'

Trenton McKinney

Related questions

Recent Activity

Donate For Us

How to `bin` the data

Groupby `'Time Bin'`