Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use argmin with groupby in pandas

Tags:

python

pandas

Suppose I have a pandas dataframe like this:

  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22

And I want to know, for each category (each value of 'cat') what is the position where the value is closest to a given value, say 5.5. I can subtract off my target value and take the absolute value, giving me something like this:

  cat  val  val_delt
0   a    1       4.5
1   a    6       0.5
2   a   12       6.5
3   b    2       3.5
4   b    5       0.5
5   b   11       5.5
6   c    4       1.5
7   c   22      16.5

But I'm stuck about where to go next. My first thought was to use argmin() with groupby(), but this gives an error:

In [375]: df.groupby('cat').val_delt.argmin()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-375-a2c3dbc43c50> in <module>()
----> 1 df.groupby('cat').val_delt.argmin()

TypeError: 'Series' object is not callable

I could, of course, come up with some horrible hacky thing in standard python where I iterate over all values of cat, then select the subset of my data corresponding to that value, perform the argmin operation then figure out where in the original dataframe that row was. But there's got to be a more elegant way to do this.

What I want as an output is either something like this:

  cat  val
1   a    6      
4   b    5       
6   c    4  

or at least some structure that contains that relevant information (eg - {'a':1, 'b':4, 'c':6} ). I don't care if I get back the index value or the index position, but I need one of the two. I don't care about getting back the value - I can always get that later once I have the index subset.

like image 994
user2543645 Avatar asked Oct 29 '13 00:10

user2543645


4 Answers

All answers here are somewhat correct, but none of them does it in a concise, beautiful and pythonic way. I leave here a clear way to do this.

>>> indx = df.groupby('cat')['val_delt'].idxmin()
>>> df.loc[indx]

  cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5
like image 147
Marcos Galletero Romero Avatar answered Oct 09 '22 12:10

Marcos Galletero Romero


You don't need the apply.

idxmin is sufficient. Just need to ensure you have set the index you want the min of.

>>> df['val_delt'] = (df.val - 5.5).abs()
>>> df.set_index('val').groupby('cat').idxmin()
     val_delt
cat          
a           6
b           5
c           4
like image 21
JSharm Avatar answered Oct 09 '22 13:10

JSharm


argmin() is not an agg function, you can use apply to get the closest index of every group:

txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""

import io

df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.argmin())
df.ix[idx, :]

output:

cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5
like image 6
HYRY Avatar answered Oct 09 '22 14:10

HYRY


Just adding to HYRY answer, you can use idxmin. Example:

import io
txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""
df = pd.read_csv(io.BytesIO(txt.encode()), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.idxmin())
df.ix[idx, :]
like image 4
Eran Rosenthal Avatar answered Oct 09 '22 13:10

Eran Rosenthal