Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas time-series data preprocessing

I have dataframe look likes this :

> dt
    text    timestamp
0   a   2016-06-13 18:00
1   b   2016-06-20 14:08
2   c   2016-07-01 07:41
3   d   2016-07-11 19:07
4   e   2016-08-01 16:00

And I want to summarise every month's data like:

> dt_month
count   timestamp
0   2   2016-06
1   2   2016-07
2   1   2016-08

the original dataset(dt) can be generated by:

import pandas as pd
data = {'text': ['a', 'b', 'c', 'd', 'e'],
    'timestamp': ['2016-06-13 18:00', '2016-06-20 14:08', '2016-07-01 07:41', '2016-07-11 19:07', '2016-08-01 16:00']}
dt = pd.DataFrame(data)

And are there any ways can plot a time-frequency plot by dt_month ?

like image 783
seanDot7 Avatar asked Jun 11 '26 08:06

seanDot7


1 Answers

You can groupby by timestamp column converted to_period and aggregate size:

print (df.text.groupby(df.timestamp.dt.to_period('m'))
              .size()
              .rename('count')
              .reset_index())

  timestamp  count
0   2016-06      2
1   2016-07      2
2   2016-08      1
like image 56
jezrael Avatar answered Jun 12 '26 22:06

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!