Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a pandas pivot table to count number of times items appear in a list together

I am trying to count the number of times users look at pages in the same session.

I am starting with a data frame listing user_ids and the page slugs they have visited:

user_id page_view_page_slug
1       slug1
1       slug2
1       slug3
1       slug4
2       slug5
2       slug3
2       slug2
2       slug1

What I am looking to get is a pivot table counting user_ids of the cross section of slugs

. slug1 slug2 slug3 slug4 slug5
slug1 2 2 2 1 1
slug2 2 2 2 1 1
slug3 2 2 2 1 1
slug4 1 1 1 1 0
slug5 1 1 1 0 1

I realize this will be the same data reflected when we see slug1 and slug2 vs slug2 and slug1 but I can't think of a better way. So far I have done a listagg

def listagg(df, grouping_idx):
    return df.groupby(grouping_idx).agg(list)
new_df = listagg(df,'user_id')

Returning:

          page_view_page_slug
user_id                                                   
1        [slug1, slug2, slug3, slug4]
2        [slug5, slug3, slug2, slug2]
7        [slug6, slug4, slug7]
9        [slug3, slug5, slug1]

But I am struggling to think of loop to count when items appear in a list together (despite the order) and how to store it. Then I also do not know how I would get this in a pivotable format.

like image 646
young_matt Avatar asked Feb 03 '21 22:02

young_matt


People also ask

How do you count how many times a panda appears?

To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

Can you count unique values in pivot table Pandas?

Counting distinct values in Pandas pivot If we want to count the unique occurrences of a specific observation (row) we'll need to use a somewhat different aggregation method. aggfunc= pd. Series. nunique will allow us to count only the distinct rows in the DataFrame that we pivoted.

How do I count the number of entries in a column in Pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do I count the number of values in a column in a DataFrame?

Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point.


1 Answers

Here is another way by using numpy broadcasting to create a matrix which is obtained by comparing each value in user_id with every other value, then create a new dataframe from this matrix with index and columns set to page_view_page_slug and take sum on level=0 along axis=0 and axis=1 to count the user_ids of the cross section of slugs:

a = df['user_id'].values
i = list(df['page_view_page_slug'])

pd.DataFrame(a[:, None] == a, index=i, columns=i)\
   .sum(level=0).sum(level=0, axis=1).astype(int)

       slug1  slug2  slug3  slug4  slug5
slug1      2      2      2      1      1
slug2      2      2      2      1      1
slug3      2      2      2      1      1
slug4      1      1      1      1      0
slug5      1      1      1      0      1
like image 114
Shubham Sharma Avatar answered Sep 25 '22 00:09

Shubham Sharma