Is there an alternative, faster approach than idxmax? [duplicate]

Question

import time
np.random.seed(0)
df = pd.DataFrame({'gr': np.random.choice(7000, 500000),
              'col': np.random.choice(1000, 500000)})
groups = df.groupby('gr')
t1 = time.time()
idx = groups.col.idxmax()
print(round(time.time() - t1,1))
0.7

Is there a way to get these indeces significantly faster than with idxmax()?

Note, I am interested in the idx.values, I don't mind losing the idx.index() of the idx series

2 revs, 2 users 94% · Accepted Answer

From my side using drop_duplicates is faster than groupby idxmax, around 8 times faster

%timeit df.sort_values(['gr','col']).drop_duplicates('gr',keep='last').index
10 loops, best of 3: 67.3 ms per loop
%timeit df.groupby('gr').col.idxmax()
1 loop, best of 3: 491 ms per loop

Is there an alternative, faster approach than idxmax? [duplicate]

Tags:

python

pandas

Tony

1 Answers

2 revs, 2 users 94%

Recent Activity

Donate For Us

Is there an alternative, faster approach than idxmax? [duplicate]

Tags:

python

pandas

Tony

1 Answers

2 revs, 2 users 94%

Related questions

Recent Activity

Donate For Us