I have a DataFrame with two columns
userID duration
0 DSm7ysk 03:08:49
1 no51CdJ 00:35:50
2 ...
with 'duration' having type timedelta. I have tried using
bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes =
5),dt.timedelta(minutes = 10),dt.timedelta(minutes =
20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]
labels = ['0-5min','5-10min','10-20min','20-30min','30min+']
df['bins'] = pd.cut(df['duration'], bins, labels = labels)
However, the binned data doesn't use the specified bins, but created on for each duration in the frame.
What is the simplest way to bin timedelta objects into irregular bins? Or am I just missing something obvious here?
In Python pandas binning by distance is achieved by means of the cut() function. We group values related to the column Cupcake into three groups: small, medium and big. In order to do it, we need to calculate the intervals within each group falls.
digitize() to put data into bins. Call numpy. digitize(x, bins) with x as a NumPy array and bins as a list containing the start and end point of each bin. Each element of the resulting array is the bin number of its corresponding element in the original array.
qcut() functionDiscretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.
Timedelta. Represents a duration, the difference between two dates or times. Timedelta is the pandas equivalent of python's datetime. timedelta and is interchangeable with it in most cases.
It works for me with pandas 0.23.4
import pandas as pd
import numpy as np
df = pd.DataFrame({
'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})
bins = [
pd.Timedelta(minutes = 0),
pd.Timedelta(minutes = 5),
pd.Timedelta(minutes = 10),
pd.Timedelta(minutes = 20),
pd.Timedelta(minutes = 30),
pd.Timedelta(hours = 4)
]
labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']
df['bins'] = pd.cut(df['duration'], bins, labels = labels)
Result:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With