Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas plot cumulative sum of counters over time

I have a following dataframe:

    Joined      User ID
0   2017-08-19  user 182737081
1   2017-05-07  user 227151009
2   2017-11-29  user 227306568
3   2016-05-22  user 13661634
4   2017-01-23  user 220545735

I'm trying to figure out how to plot user growth over time. I figured the best way is to plot a cumulative sum. I put together a simple code:

tmp = members[['Joined']].copy()
tmp['count'] = 1
tmp.set_index('Joined', inplace=True)

This produces the following cumsum:

            count
Joined  
2017-08-19  1
2017-05-07  2
2017-11-29  3
2016-05-22  4
2017-01-23  5

Now when I try to plot this using tmp.plot() I get something super weird like this, uh:

cumulative sum as plotted by pandas

  1. I genuinely have no idea what is this plot actually displaying (this looks like some kind of cumulative delta trend line?)
  2. How do I plot cumulative user growth over time 📈

The version of pandas I'm using: pandas (0.20.3)

In case you are curious whether the length of the series is the same as the highest count:

tmp.cumsum().max() == len(tmp)

count  True
dtype: bool
like image 999
milosgajdos Avatar asked Dec 24 '22 10:12

milosgajdos


1 Answers

Seems like you need sort_index, then cumsum , then plot

#tmp.index=pd.to_datetime(tmp.index)

tmp.sort_index().cumsum().plot()

enter image description here

like image 118
BENY Avatar answered Dec 25 '22 23:12

BENY