Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get top rows from column value count with pandas

Tags:

python

pandas

Let's say I have this kind of data. It's a set of reviews of some products.

prod_id text    rating
AB123   some text   5
AB123   some text   2
AB123   some text   4
AC456   some text   3
AC456   some text   2
AD777   some text   2
AD777   some text   5
AD777   some text   5
AD777   some text   4
AE999   some text   4
AF000   some text   5
AG222   some text   5
AG222   some text   3
AG222   some text   3

I want to know which product has the most reviews (the most rows), so I use the following code to get the top 3 products (I only need 3 top most reviewed products).

s = df['prod_id'].value_counts().sort_values(ascending=False).head(3)

And then I will get this result.

AD777   4
AB123   3
AG222   3

But what I actually need is the rows with the ids as above. I need the whole rows of all AD777, AB123, and AG222, like below.

product_id  text    rating
AD777   some text   2
AD777   some text   5
AD777   some text   5
AD777   some text   4
AB123   some text   5
AB123   some text   2
AB123   some text   4
AG222   some text   5
AG222   some text   3
AG222   some text   3

How do I do that? I tried the print(df.iloc[s]), but of course it's not working. As I read on the documentation, value_counts return series and not dataframe. Any idea? Thanks

like image 625
catris25 Avatar asked Dec 24 '22 15:12

catris25


1 Answers

I think you need merge with left join with DataFrame created with index of s:

df = pd.DataFrame({'prod_id':s.index}).merge(df, how='left')
print (df)
  prod_id       text  rating
0   AD777  some text       2
1   AD777  some text       5
2   AD777  some text       5
3   AD777  some text       4
4   AB123  some text       5
5   AB123  some text       2
6   AB123  some text       4
7   AG222  some text       5
8   AG222  some text       3
9   AG222  some text       3
like image 197
jezrael Avatar answered Jan 08 '23 13:01

jezrael