I have this data:
df = pd.DataFrame({'start_date': ['2019/12/01 01:00:00', '2019/12/05 01:00:00',
'2019/12/01 01:00:00', '2019/12/01 01:00:00'],
'end_date': ['2019/12/05 10:00:00', '2019/12/09 10:00:00',
'2019/12/11 10:00:00', '2019/12/09 01:00:00'],
'campaign_id' : [1,2,3,4]})
I'd like to plot the number of campaigns active each day from 2019/12/01 to 2019/12/11.
How can I do this? It's like a histogram, but by dates - but with multiple dates for each row.
I've got as far as converting the columns to timestamps:
df.start_date = df.start_date.astype('datetime64[ns]')
df.end_date = df.start_date.astype('datetime64[ns]')
Perhaps I need a new column that is pd.date_range
, then I'll be able to do some clever pandas grouping...?
df["date_range"] = pd.date_range(df.start_date, df.end_date)
But that gives me an error.
I guess a more manual approach would be to make a new dataframe with a row for each day, then a histogram of that?
Maybe this:
pd.concat([
pd.Series(x.campaign_id, index=pd.date_range(x.start_date, x.end_date, freq='D'))
for i, x in df.iterrows()
]).groupby(level=0).value_counts().unstack('date').plot.bar()
Or this:
df['start_date'] = pd.to_datetime(df['start_date']).dt.normalize()
df['end_date'] = pd.to_datetime(df['end_date']).dt.normalize()
(df.assign(dummy=1)
.merge(pd.DataFrame({'dummy':1,
'date': pd.date_range('2019-12-01', '2019-12-11', freq='D')}),
on='dummy'
)
.query('start_date<=date<=end_date')
.groupby('date')['campaign_id']
.value_counts()
.unstack('date')
.plot.bar()
)
Output:
Or remove 'date'
inside unstack()
to get counts by date:
Or if you are interested in the total daily events:
(df.assign(dummy=1)
.merge(pd.DataFrame({'dummy':1,
'date': pd.date_range('2019-12-01', '2019-12-11', freq='D')}),
on='dummy'
)
.query('start_date<=date<=end_date')
.groupby('date')['campaign_id']
.count()
.plot.bar()
)
Output:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With