Let's say I have this kind of data. It's a set of reviews of some products.
prod_id text rating
AB123 some text 5
AB123 some text 2
AB123 some text 4
AC456 some text 3
AC456 some text 2
AD777 some text 2
AD777 some text 5
AD777 some text 5
AD777 some text 4
AE999 some text 4
AF000 some text 5
AG222 some text 5
AG222 some text 3
AG222 some text 3
I want to know which product has the most reviews (the most rows), so I use the following code to get the top 3 products (I only need 3 top most reviewed products).
s = df['prod_id'].value_counts().sort_values(ascending=False).head(3)
And then I will get this result.
AD777 4
AB123 3
AG222 3
But what I actually need is the rows with the ids as above. I need the whole rows of all AD777, AB123, and AG222, like below.
product_id text rating
AD777 some text 2
AD777 some text 5
AD777 some text 5
AD777 some text 4
AB123 some text 5
AB123 some text 2
AB123 some text 4
AG222 some text 5
AG222 some text 3
AG222 some text 3
How do I do that? I tried the print(df.iloc[s])
, but of course it's not working. As I read on the documentation, value_counts
return series and not dataframe. Any idea? Thanks
I think you need merge
with left
join with DataFrame
created with index
of s
:
df = pd.DataFrame({'prod_id':s.index}).merge(df, how='left')
print (df)
prod_id text rating
0 AD777 some text 2
1 AD777 some text 5
2 AD777 some text 5
3 AD777 some text 4
4 AB123 some text 5
5 AB123 some text 2
6 AB123 some text 4
7 AG222 some text 5
8 AG222 some text 3
9 AG222 some text 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With