Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python/Pandas Binning Data Timedelta

I have a DataFrame with two columns

    userID     duration
0   DSm7ysk    03:08:49
1   no51CdJ    00:35:50
2   ...

with 'duration' having type timedelta. I have tried using

bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes = 
        5),dt.timedelta(minutes = 10),dt.timedelta(minutes = 
        20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]

labels = ['0-5min','5-10min','10-20min','20-30min','30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

However, the binned data doesn't use the specified bins, but created on for each duration in the frame.

What is the simplest way to bin timedelta objects into irregular bins? Or am I just missing something obvious here?

like image 615
cmf05 Avatar asked Oct 25 '17 10:10

cmf05


People also ask

How do you binning a panda in Python?

In Python pandas binning by distance is achieved by means of the cut() function. We group values related to the column Cupcake into three groups: small, medium and big. In order to do it, we need to calculate the intervals within each group falls.

How do I put data into a python bin?

digitize() to put data into bins. Call numpy. digitize(x, bins) with x as a NumPy array and bins as a list containing the start and end point of each bin. Each element of the resulting array is the bin number of its corresponding element in the original array.

What is QCUT in Python?

qcut() functionDiscretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.

What is Timedelta in pandas?

Timedelta. Represents a duration, the difference between two dates or times. Timedelta is the pandas equivalent of python's datetime. timedelta and is interchangeable with it in most cases.


1 Answers

It works for me with pandas 0.23.4

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
    'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})

bins = [
    pd.Timedelta(minutes = 0),
    pd.Timedelta(minutes = 5),
    pd.Timedelta(minutes = 10),
    pd.Timedelta(minutes = 20),
    pd.Timedelta(minutes = 30),
    pd.Timedelta(hours = 4)
]

labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

Result:

result

like image 107
godfryd Avatar answered Sep 28 '22 18:09

godfryd