I have a pandas DataFrame which details online activities in terms of "clicks" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1.5 million samples. Obviously most users have multiple records.
The four columns are a unique user id, the date when the user began the service "Registration", the date the user used the service "Session", the total number of clicks.
The organization of the dataframe is as follows:
User_ID Registration Session clicks
2349876 2012-02-22 2014-04-24 2
1987293 2011-02-01 2013-05-03 1
2234214 2012-07-22 2014-01-22 7
9874452 2010-12-22 2014-08-22 2
...
(There is also an index above beginning with 0, but one could set User_ID
as the index.)
I would like to aggregate the total number of clicks by the user since Registration date. The dataframe (or pandas Series object) would list User_ID and "Total_Number_Clicks".
User_ID Total_Clicks
2349876 722
1987293 341
2234214 220
9874452 1405
...
How does one do this in pandas? Is this done by .agg()
? Each User_ID
needs to be summed individually.
As there are 1.5 million records, does this scale?
To sum pandas DataFrame columns (given selected multiple columns) using either sum() , iloc[] , eval() and loc[] functions. Among these pandas DataFrame. sum() function returns the sum of the values for the requested axis, In order to calculate the sum of columns use axis=1 .
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
How do you sum unique values in pandas? pandas provides the useful function value_counts() to count unique items – it returns a Series with the counts of unique values. From the output of line 10 you can see the result, which is a count of the column col1 . Category data value count with normalize.
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
IIUC you can use groupby
, sum
and reset_index
:
print df
User_ID Registration Session clicks
0 2349876 2012-02-22 2014-04-24 2
1 1987293 2011-02-01 2013-05-03 1
2 2234214 2012-07-22 2014-01-22 7
3 9874452 2010-12-22 2014-08-22 2
print df.groupby('User_ID')['clicks'].sum().reset_index()
User_ID clicks
0 1987293 1
1 2234214 7
2 2349876 2
3 9874452 2
If first column User_ID
is index
:
print df
Registration Session clicks
User_ID
2349876 2012-02-22 2014-04-24 2
1987293 2011-02-01 2013-05-03 1
2234214 2012-07-22 2014-01-22 7
9874452 2010-12-22 2014-08-22 2
print df.groupby(level=0)['clicks'].sum().reset_index()
User_ID clicks
0 1987293 1
1 2234214 7
2 2349876 2
3 9874452 2
Or:
print df.groupby(df.index)['clicks'].sum().reset_index()
User_ID clicks
0 1987293 1
1 2234214 7
2 2349876 2
3 9874452 2
EDIT:
As Alexander pointed, you need filter data before groupby
, if Session
dates is less as Registration
dates per User_ID
:
print df
User_ID Registration Session clicks
0 2349876 2012-02-22 2014-04-24 2
1 1987293 2011-02-01 2013-05-03 1
2 2234214 2012-07-22 2014-01-22 7
3 9874452 2010-12-22 2014-08-22 2
print df[df.Session >= df.Registration].groupby('User_ID')['clicks'].sum().reset_index()
User_ID clicks
0 1987293 1
1 2234214 7
2 2349876 2
3 9874452 2
I change 3. row of data for better sample:
print df
Registration Session clicks
User_ID
2349876 2012-02-22 2014-04-24 2
1987293 2011-02-01 2013-05-03 1
2234214 2012-07-22 2012-01-22 7
9874452 2010-12-22 2014-08-22 2
print df.Session >= df.Registration
User_ID
2349876 True
1987293 True
2234214 False
9874452 True
dtype: bool
print df[df.Session >= df.Registration]
Registration Session clicks
User_ID
2349876 2012-02-22 2014-04-24 2
1987293 2011-02-01 2013-05-03 1
9874452 2010-12-22 2014-08-22 2
df1 = df[df.Session >= df.Registration]
print df1.groupby(df1.index)['clicks'].sum().reset_index()
User_ID clicks
0 1987293 1
1 2349876 2
2 9874452 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With