Finding count of duplicate values and ordering in a Pandas dataframe

Question

I have a Pandas dataframe with many columns, two of them are "movie title" and "age", I want to find the top 5 movies with the lowest average age of the people that rated them, but only include movies which have at least 100 ratings (so appear in at least 100 rows).

For example:

movie title      age

Title 1          10
Title 2          12
Title 2          12
Title 3          13
Title 3          13
Title 3          13

Should become:

movie title     # of ratings     avg age

Title 1         1                    10
Title 2         2                    12
Title 3         3                    13

It can be in the same or a new dataframe. Thanks for your help!

Ami Tavory · Accepted Answer

Say you do

agg = df.age.groupby([df['movie title']]).agg({'ave_age': 'mean', 'size': 'size'})

You'll get a DataFrame with columns ave_age and size.

agg[agg['size'] > 100]

will give you only those that have more than 100 users. From there, sort by agg.ave_age and take the top 5. It should look something like this:

agg[agg['size'] > 100].sort_values(by='ave_age', ascending=True).head(5)

Alexander · Answer

The filter creates a flag for each movie that is set to True if the movie title count is more than one hundred and False otherwise.

n = 100
filter = (df.groupby(['movie title'])['age']
          .transform(lambda group: group.count()) >= n)

Given the small size of your sample data, I will set n to be 2 and create my filter.

Now I just filter on movies with a count exceeding n, calculate the average age per group, and then take the five smallest (i.e. lowest age).

>>> df[filter.values].groupby('movie title').age.mean().nsmallest(5)
movie title
Title 2    12
Title 3    13
Name: age, dtype: int64

Finding count of duplicate values and ordering in a Pandas dataframe

Tags:

python

pandas

user2453297

2 Answers

Ami Tavory

Alexander

Recent Activity

Donate For Us

Finding count of duplicate values and ordering in a Pandas dataframe

Tags:

python

pandas

user2453297

2 Answers

Ami Tavory

Alexander

Related questions

Recent Activity

Donate For Us