Get date quantiles in pandas

Question

I have some data that looks something like:

user  timestamp  value1 
   a 2007-01-01       7 
   a 2007-02-02       8 
   a 2007-02-03       9 
   b 2007-02-04       1 
   a 2007-02-05       2 
   b 2007-02-06       3 
   b 2007-02-07       4 
   a 2007-02-08       5 
...

where each user has a different number of entries.

My goal is to have a sense of how quickly these entries are being produced, and output something like:

     last_entry median_entry first_entry
user                                    
a    2007-02-08   2007-02-03  2007-01-01
b    2007-02-07   2007-02-06  2007-02-04

So far, I have code like:

gb = df.groupby('user')
time_median = gb['timestamp'].median()

But this gives me DataError: No numeric types to aggregate, presumably because dates are not numeric.

I guess it would be possible to turn the dates into timestamps, and find the median of those, and then turn them back into date_time objects. Is that the best way?

Jeremy · Accepted Answer

I may have been unclear in my question, but I found a solution that works for me.

def get_quantile(df, q):
    # Function that gets quantile from integer timestamp, then changes
    # back to a date_time object
    return pd.to_datetime(df['timestamp'].quantile(q, interpolation='nearest'))

df = pd.DataFrame(data={'user': np.random.choice(['a', 'b','c'], size=100, replace=True), 'value': np.random.random(size=100), 'date_time': pd.date_range(start=date(2016, 1,1), freq='D', periods=100)})

# Make a column of integer timestamps
df['timestamp'] = df['date_time'].astype('int')

editors = d.groupby('editor')

result = pd.DataFrame()
# Add columns of quantiles
result['first_quantile'] = get_quantile(editors, .25)
etc.

mgilbert · Answer

Assuming you want to treat each date after the initial date for each user as the number of days since this initial date, you could do something like

import pandas as pd
dts =  pd.date_range(start="2015-01-15", periods=20)
users = ["a","b"]*10
df = pd.DataFrame({"user":users, "timestamp":dts})

date_info = df.groupby("user").agg({"timestamp":[min, max]})
date_info.columns = date_info.columns.droplevel()

since_incept = lambda x: x - x.min()
df["days"] = df.groupby("user").transform(since_incept)
df["days"] = df["days"].dt.days

median_td = lambda x: pd.Timedelta(pd.Series.median(x), "D")
med = df.groupby("user").agg({"days":[median_td]})

date_info["median"] = date_info["min"] + med.loc[:, ("days", "<lambda>")]

Get date quantiles in pandas

Tags:

python

pandas

numpy

Jeremy

2 Answers

Jeremy

mgilbert

Recent Activity

Donate For Us

Get date quantiles in pandas

Tags:

python

pandas

numpy

Jeremy

2 Answers

Jeremy

mgilbert

Related questions

Recent Activity

Donate For Us