Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: make a date histogram, given a start and end date?

Tags:

pandas

I have this data:

df = pd.DataFrame({'start_date': ['2019/12/01 01:00:00', '2019/12/05 01:00:00', 
                                  '2019/12/01 01:00:00', '2019/12/01 01:00:00'],
                   'end_date': ['2019/12/05 10:00:00', '2019/12/09 10:00:00', 
                                '2019/12/11 10:00:00', '2019/12/09 01:00:00'],
                   'campaign_id' : [1,2,3,4]})

I'd like to plot the number of campaigns active each day from 2019/12/01 to 2019/12/11.

How can I do this? It's like a histogram, but by dates - but with multiple dates for each row.

I've got as far as converting the columns to timestamps:

df.start_date = df.start_date.astype('datetime64[ns]')
df.end_date = df.start_date.astype('datetime64[ns]')

Perhaps I need a new column that is pd.date_range, then I'll be able to do some clever pandas grouping...?

df["date_range"] = pd.date_range(df.start_date, df.end_date)

But that gives me an error.

I guess a more manual approach would be to make a new dataframe with a row for each day, then a histogram of that?

like image 896
Richard Avatar asked Sep 13 '25 00:09

Richard


1 Answers

Maybe this:

pd.concat([
    pd.Series(x.campaign_id, index=pd.date_range(x.start_date, x.end_date, freq='D'))
    for i, x in df.iterrows()
]).groupby(level=0).value_counts().unstack('date').plot.bar()

Or this:

df['start_date'] = pd.to_datetime(df['start_date']).dt.normalize()
df['end_date'] = pd.to_datetime(df['end_date']).dt.normalize()

(df.assign(dummy=1)
   .merge(pd.DataFrame({'dummy':1,
                        'date': pd.date_range('2019-12-01', '2019-12-11', freq='D')}),
          on='dummy'
         )
   .query('start_date<=date<=end_date')
   .groupby('date')['campaign_id']
   .value_counts()
   .unstack('date')
   .plot.bar()
)

Output:

enter image description here

Or remove 'date' inside unstack() to get counts by date:

enter image description here

Or if you are interested in the total daily events:

(df.assign(dummy=1)
   .merge(pd.DataFrame({'dummy':1,
                        'date': pd.date_range('2019-12-01', '2019-12-11', freq='D')}),
          on='dummy'
         )
   .query('start_date<=date<=end_date')
   .groupby('date')['campaign_id']
   .count()
   .plot.bar()
)

Output:

enter image description here

like image 167
Quang Hoang Avatar answered Sep 17 '25 20:09

Quang Hoang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!