How to use argmin with groupby in pandas

Question

Suppose I have a pandas dataframe like this:

And I want to know, for each category (each value of 'cat') what is the position where the value is closest to a given value, say 5.5. I can subtract off my target value and take the absolute value, giving me something like this:

  cat  val  val_delt
0   a    1       4.5
1   a    6       0.5
2   a   12       6.5
3   b    2       3.5
4   b    5       0.5
5   b   11       5.5
6   c    4       1.5
7   c   22      16.5

But I'm stuck about where to go next. My first thought was to use argmin() with groupby(), but this gives an error:

In [375]: df.groupby('cat').val_delt.argmin()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-375-a2c3dbc43c50> in <module>()
----> 1 df.groupby('cat').val_delt.argmin()

TypeError: 'Series' object is not callable

I could, of course, come up with some horrible hacky thing in standard python where I iterate over all values of cat, then select the subset of my data corresponding to that value, perform the argmin operation then figure out where in the original dataframe that row was. But there's got to be a more elegant way to do this.

What I want as an output is either something like this:

or at least some structure that contains that relevant information (eg - {'a':1, 'b':4, 'c':6} ). I don't care if I get back the index value or the index position, but I need one of the two. I don't care about getting back the value - I can always get that later once I have the index subset.

Marcos Galletero Romero · Accepted Answer

All answers here are somewhat correct, but none of them does it in a concise, beautiful and pythonic way. I leave here a clear way to do this.

>>> indx = df.groupby('cat')['val_delt'].idxmin()
>>> df.loc[indx]

  cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5

JSharm · Answer

You don't need the apply.

idxmin is sufficient. Just need to ensure you have set the index you want the min of.

>>> df['val_delt'] = (df.val - 5.5).abs()
>>> df.set_index('val').groupby('cat').idxmin()
     val_delt
cat          
a           6
b           5
c           4

HYRY · Answer

argmin() is not an agg function, you can use apply to get the closest index of every group:

txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""

import io

df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.argmin())
df.ix[idx, :]

output:

cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5

Eran Rosenthal · Answer

Just adding to HYRY answer, you can use idxmin. Example:

import io
txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""
df = pd.read_csv(io.BytesIO(txt.encode()), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.idxmin())
df.ix[idx, :]

How to use argmin with groupby in pandas

Tags:

python

pandas

user2543645

4 Answers

Marcos Galletero Romero

JSharm

HYRY

Eran Rosenthal

Recent Activity

Donate For Us

How to use argmin with groupby in pandas

Tags:

python

pandas

user2543645

4 Answers

Marcos Galletero Romero

JSharm

HYRY

Eran Rosenthal

Related questions

Recent Activity

Donate For Us