Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select the first 3 rows of every group in pandas?

Tags:

python

pandas

I get a pandas dataframe like this:

    id   prob
0    1   0.5   
1    1   0.6
2    1   0.4
3    1   0.2
4    2   0.3
6    2   0.5
...

I want to group it by 'id', sort descending order and get the first 3 prob of every group. Note that some groups contain rows less than 3. Finally I want to get a 2D array like:

[[1, 0.6, 0.5, 0.4], [2, [0.5, 0.3]]...]

How can I do that with pandas? Thanks!

like image 656
Ink Avatar asked Sep 01 '17 03:09

Ink


4 Answers

Use sort_values, groupby, and head:

df.sort_values(by=['id','prob'], ascending=[True,False]).groupby('id').head(3).values

Output:

array([[ 1. ,  0.6],
       [ 1. ,  0.5],
       [ 1. ,  0.4],
       [ 2. ,  0.5],
       [ 2. ,  0.3]])

Following @COLDSPEED lead:

df.sort_values(by=['id','prob'], ascending=[True,False])\
  .groupby('id').agg(lambda x: x.head(3).tolist())\
  .reset_index().values.tolist()

Output:

[[1, [0.6, 0.5, 0.4]], [2, [0.5, 0.3]]]
like image 193
Scott Boston Avatar answered Oct 21 '22 02:10

Scott Boston


You can use groupby and nlargest

df.groupby('id').prob.nlargest(3).reset_index(1,drop = True)

id
1    0.6
1    0.5
1    0.4
2    0.5
2    0.3

For the array

df1 = df.groupby('id').prob.nlargest(3).unstack(1)#.reset_index(1,drop = True)#.set_index('id')
np.column_stack((df1.index.values, df1.values))

You get

array([[ 1. ,  0.5,  0.6,  0.4,  nan,  nan],
       [ 2. ,  nan,  nan,  nan,  0.3,  0.5]])
like image 42
Vaishali Avatar answered Oct 21 '22 02:10

Vaishali


If you're looking for a dataframe of array columns, you can use np.sort:

df = df.groupby('id').prob.apply(lambda x: np.sort(x.values)[:-4:-1])
df

id
1    [0.6, 0.5, 0.4]
2         [0.5, 0.3]

To retrieve the values, reset_index and access:

df.reset_index().values

array([[1, array([ 0.6,  0.5,  0.4])],
       [2, array([ 0.5,  0.3])]], dtype=object)
like image 33
cs95 Avatar answered Oct 21 '22 04:10

cs95


[[n, g.nlargest(3).tolist()] for n, g in df.groupby('id').prob]

[[1, [0.6, 0.5, 0.4]], [2, [0.5, 0.3]]]
like image 2
piRSquared Avatar answered Oct 21 '22 03:10

piRSquared