Suppose I have a pandas dataframe like this:
cat val
0 a 1
1 a 6
2 a 12
3 b 2
4 b 5
5 b 11
6 c 4
7 c 22
And I want to know, for each category (each value of 'cat') what is the position where the value is closest to a given value, say 5.5. I can subtract off my target value and take the absolute value, giving me something like this:
cat val val_delt
0 a 1 4.5
1 a 6 0.5
2 a 12 6.5
3 b 2 3.5
4 b 5 0.5
5 b 11 5.5
6 c 4 1.5
7 c 22 16.5
But I'm stuck about where to go next. My first thought was to use argmin() with groupby(), but this gives an error:
In [375]: df.groupby('cat').val_delt.argmin()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-375-a2c3dbc43c50> in <module>()
----> 1 df.groupby('cat').val_delt.argmin()
TypeError: 'Series' object is not callable
I could, of course, come up with some horrible hacky thing in standard python where I iterate over all values of cat, then select the subset of my data corresponding to that value, perform the argmin operation then figure out where in the original dataframe that row was. But there's got to be a more elegant way to do this.
What I want as an output is either something like this:
cat val
1 a 6
4 b 5
6 c 4
or at least some structure that contains that relevant information (eg - {'a':1, 'b':4, 'c':6} ). I don't care if I get back the index value or the index position, but I need one of the two. I don't care about getting back the value - I can always get that later once I have the index subset.
All answers here are somewhat correct, but none of them does it in a concise, beautiful and pythonic way. I leave here a clear way to do this.
>>> indx = df.groupby('cat')['val_delt'].idxmin()
>>> df.loc[indx]
cat val val_delt
1 a 6 0.5
4 b 5 0.5
6 c 4 1.5
You don't need the apply.
idxmin
is sufficient. Just need to ensure you have set the index you want the min of.
>>> df['val_delt'] = (df.val - 5.5).abs()
>>> df.set_index('val').groupby('cat').idxmin()
val_delt
cat
a 6
b 5
c 4
argmin()
is not an agg function, you can use apply to get the closest index of every group:
txt = """ cat val
0 a 1
1 a 6
2 a 12
3 b 2
4 b 5
5 b 11
6 c 4
7 c 22"""
import io
df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.argmin())
df.ix[idx, :]
output:
cat val val_delt
1 a 6 0.5
4 b 5 0.5
6 c 4 1.5
Just adding to HYRY answer, you can use idxmin. Example:
import io
txt = """ cat val
0 a 1
1 a 6
2 a 12
3 b 2
4 b 5
5 b 11
6 c 4
7 c 22"""
df = pd.read_csv(io.BytesIO(txt.encode()), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.idxmin())
df.ix[idx, :]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With