I have a DataFrame... <pre class="prettyprint"><code>>>> df = pd.DataFrame({ ... 'letters' : ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], ... 'is_min' : np.zeros(9), ... 'numbers' : np.random.randn(9) ... }) is_min letters numbers 0 0 a 0.322499 1 0 a -0.196617 2 0 a -1.194251 3 0 b 1.005323 4 0 b -0.186364 5 0 b -1.886273 6 0 c 0.014960 7 0 c -0.832713 8 0 c 0.689531 </code></pre> I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'. I have tried this and feel that I am close... <pre class="prettyprint"><code>>>> df.groupby('letters')['numbers'].transform('idxmin') 0 2 1 2 2 2 3 5 4 5 5 5 6 7 7 7 8 7 dtype: int64 </code></pre> I am having a hard time connecting the dots to set the val of 'is_min' to 1.

Pass the row labels to <code>loc</code> and set the column: <pre class="prettyprint"><code>In [34]: df.loc[df.groupby('letters')['numbers'].transform('idxmin'), 'is_min']=1 df Out[34]: is_min letters numbers 0 1 a -0.374751 1 0 a 1.663334 2 0 a -0.123599 3 1 b -2.156204 4 0 b 0.201493 5 0 b 1.639512 6 0 c -0.447271 7 0 c 0.017204 8 1 c -1.261621 </code></pre> So what's happening here is that by calling <code>loc</code> we only select the rows that are returned by your <code>transform</code> method and these get set to <code>1</code> as desired. Not sure if it matters much but you could call <code>unique</code> so that you get just the row labels without repetition which may be faster: <pre class="prettyprint"><code>df.loc[df.groupby('letters')['numbers'].transform('idxmin').unique(), 'is_min']=1 </code></pre>

Pandas set value in groupby

Tags:

python

pandas

numpy

I have a DataFrame...

>>> df = pd.DataFrame({
...            'letters' : ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], 
...            'is_min' : np.zeros(9),
...            'numbers' : np.random.randn(9)
... })

    is_min  letters numbers
0   0       a       0.322499
1   0       a      -0.196617
2   0       a      -1.194251
3   0       b       1.005323
4   0       b      -0.186364
5   0       b      -1.886273
6   0       c       0.014960
7   0       c      -0.832713
8   0       c       0.689531

I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'. I have tried this and feel that I am close...

>>> df.groupby('letters')['numbers'].transform('idxmin')

0    2
1    2
2    2
3    5
4    5
5    5
6    7
7    7
8    7
dtype: int64

I am having a hard time connecting the dots to set the val of 'is_min' to 1.

793

asked Jan 27 '16 19:01

Bruce Pucci

2 Answers

Pass the row labels to loc and set the column:

In [34]:
df.loc[df.groupby('letters')['numbers'].transform('idxmin'), 'is_min']=1
df

Out[34]:
   is_min letters   numbers
0       1       a -0.374751
1       0       a  1.663334
2       0       a -0.123599
3       1       b -2.156204
4       0       b  0.201493
5       0       b  1.639512
6       0       c -0.447271
7       0       c  0.017204
8       1       c -1.261621

So what's happening here is that by calling loc we only select the rows that are returned by your transform method and these get set to 1 as desired.

Not sure if it matters much but you could call unique so that you get just the row labels without repetition which may be faster:

df.loc[df.groupby('letters')['numbers'].transform('idxmin').unique(), 'is_min']=1

151

answered Sep 21 '22 18:09

EdChum

I would like to set the 'is_min' col to 1 if 'numbers' is the minimum value by column 'letters'.

A perhaps more intuitive method is to calculate the minima per group of letters, then use group-wise .apply to assign is_min:

def set_is_min(m):
   df.loc[df.numbers == m, 'is_min'] = 1
mins = df.groupby('letters').numbers.min().apply(set_is_min)

In large dataframes, this method is actually 20% faster than using transform:

# timeit with 100'000 rows
# .apply on group minima
100 loops, best of 3: 16.7 ms per loop
# .transform
10 loops, best of 3: 21.9 ms per loop

I ran a some more benchmarks of various methods using apply and transform.

answered Sep 20 '22 18:09

miraculixx

Related questions
                            
                                Vectorize haversine distance computation along path given by list of coordinates
                            
                                Fast and pythonic way to find out if a string is a palindrome
                            
                                Python2 math.fsum not accurate?
                            
                                How to return all the properties of a node with values
                            
                                Having trouble using requests for urls
                            
                                How to coarser the 2-d array data resolution
                            
                                Multiple increment operators on the same line Python
                            
                                Why are sockets closed in list comprehension but not in for loop?
                            
                                scikit-learn custom transformer / pipeline that changes X and Y
                            
                                Creating matplotlib legend with dynamic number of columns
                            
                                Pyglet Image Rendering
                            
                                Extending UserCreationForm: password not saved
                            
                                truncating a text file does not change the file
                            
                                Python: Remove empty folders recursively
                            
                                find the max of column group by another column pandas
                            
                                Python: run SimpleHTTPServer and make request to it in a script
                            
                                Connect to Impala using impyla client with Kerberos auth
                            
                                Pandas DataFrame with MultiIndex: Group by year of DateTime level values
                            
                                Vectorized "and" for pandas columns
                            
                                Calling none in maps in Python 3 [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With