After some transformations I got the following dataframe
, how do I proceed to obtain the top n records by a column in this case short_name
and using other as indicator frequency
. I read this post but the problem with both solutions is that they get rid of the column product_name
, they just retain the grouped column and I need to keep them all.
short_name product_id frequency
Yoghurt y cereales 975009684 32
Yoghurt y cereales 975009685 21
Yoghurt y cereales 975009700 16
Yoghurt y Cereales 21097 16
Yoghurt Bebible 21329 68
Yoghurt Bebible 21328 67
Yoghurt Bebible 21500 31
head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.
Definition and Usage The values property returns all values in the DataFrame. The return value is a 2-dimensional array with one array for each row.
Pandas nlargest function can take more than one variable to order the top rows. We can give a list of variables as input to nlargest and get first n rows ordered by the list of columns in descending order. Here we get top 3 rows with largest values in column “lifeExp” and then “gdpPercap”.
I'd try to use nlargest
method:
In [5]: df.groupby('short_name', as_index=False).apply(lambda x: x.nlargest(2, 'frequency'))
Out[5]:
short_name product_id frequency
0 4 Yoghurt Bebible 21329 68
5 Yoghurt Bebible 21328 67
1 3 Yoghurt y Cereales 21097 16
2 0 Yoghurt y cereales 975009684 32
1 Yoghurt y cereales 975009685 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With