I have a dataframe of tweets and I'm looking to group the dataframe by date and generate a column that contains a cumulative list of all the unique users who have posted up to that date. None of the existing functions (e.g., cumsum) would appear to work for this. Here's a sample of the original tweet dataframe, where the index (created_at) is in datetime format:
In [3]: df
Out[3]:
screen_name
created_at
04-01-16 Bob
04-01-16 Bob
04-01-16 Sally
04-01-16 Sally
04-02-16 Bob
04-02-16 Miguel
04-02-16 Tim
I can collapse the dataset by date and get a column with the unique users per day:
In [4]: df[['screen_name']].groupby(df.index.date).aggregate(lambda x: set(list(x)))
Out[4]: from_user_screen_name
2016-04-02 {Bob, Sally}
2016-04-03 {Bob, Miguel, Tim}
So far so good. But what I'd like is to have a "cumulative set" like this:
Out[4]: Cumulative_list_up_to_this_date Cumulative_number_of_unique_users
2016-04-02 {Bob, Sally} 2
2016-04-03 {Bob, Sally, Miguel, Tim} 4
Ultimately, what I am really interested in is the cumulative number in the last column so I can plot it. I've considered looping over dates and other things but can't seem to find a good way. Thanks in advance for any help.
In Python, we can find the cumulative product of array elements using the cumprod() method from the NumPy library.
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
You cannot add sets, but can add lists! So build a list of users, then take the cumulative sum and finally apply the set constructor to get rid of duplicates.
cum_names = (df['screen_name'].groupby(df.index.date)
.agg(lambda x: list(x))
.cumsum()
.apply(set))
# 2016-04-01 {Bob, Sally}
# 2016-04-02 {Bob, Miguel, Tim, Sally}
# dtype: object
cum_count = cum_names.apply(len)
# 2016-04-01 2
# 2016-04-02 4
# dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With