Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

return a series of numeric positions for the indices representing the max within each group

Tags:

python

pandas

Consider the series:

np.random.seed([3,1415])
s = pd.Series(np.random.rand(100),
              pd.MultiIndex.from_product([list('ABDCE'),
                                          list('abcde'),
                                          ['One', 'Two', 'Three', 'Four']]))

I can groupby combinations of index levels and get the idxmax:

s.groupby(level=[0, 2]).idxmax()

A  Four      (A, c, Four)
   One        (A, d, One)
   Three    (A, c, Three)
   Two        (A, d, Two)
B  Four      (B, d, Four)
   One        (B, d, One)
   Three    (B, c, Three)
   Two        (B, b, Two)
C  Four      (C, b, Four)
   One        (C, a, One)
   Three    (C, a, Three)
   Two        (C, e, Two)
D  Four      (D, b, Four)
   One        (D, e, One)
   Three    (D, b, Three)
   Two        (D, c, Two)
E  Four      (E, c, Four)
   One        (E, a, One)
   Three    (E, c, Three)
   Two        (E, a, Two)
dtype: object

I want the numeric position of each of these within each group.

I can get the numeric positions via the awesome answers to this question

s.groupby(level=[0, 2]).idxmax().apply(lambda x: s.index.get_loc(x))

A  Four     11
   One      12
   Three    10
   Two      13
B  Four     35
   One      32
   Three    30
   Two      25
C  Four     67
   One      60
   Three    62
   Two      77
D  Four     47
   One      56
   Three    46
   Two      49
E  Four     91
   One      80
   Three    90
   Two      81
dtype: int64

But I want this instead:

A  Four     2
   One      3
   Three    2
   Two      3
B  Four     3
   One      3
   Three    2
   Two      1
C  Four     1
   One      0
   Three    0
   Two      4
D  Four     1
   One      4
   Three    1
   Two      2
E  Four     2
   One      0
   Three    2
   Two      0
dtype: int64
like image 677
piRSquared Avatar asked Dec 25 '22 03:12

piRSquared


2 Answers

Well I finally have a solution, which uses NumPy's reshaping method and then operates along one of the axes to give us argmax. I am not sure if this is elegant, but I am hoping would be good in terms of performance. Also, I am assuming that pandas Series for multi-index data has a regular format, i.e. each level maintains the number of elements across all indices.

Here's the implementation -

L0,L1,L2 = s.index.levels[:3]
IDs = s.sortlevel().values.reshape(-1,len(L0),len(L1),len(L2)).argmax(2)
sOut = pd.Series(IDs.ravel(),pd.MultiIndex.from_product([L0,L2]))

Timing (complements of pir)

enter image description here

like image 132
Divakar Avatar answered Jan 04 '23 22:01

Divakar


This is how I did it:

s.groupby(level=[0, 2]).apply(lambda x: x.index.get_loc(x.idxmax()))

A  Four     2
   One      3
   Three    2
   Two      3
B  Four     3
   One      3
   Three    2
   Two      1
C  Four     1
   One      0
   Three    0
   Two      4
D  Four     1
   One      4
   Three    1
   Two      2
E  Four     2
   One      0
   Three    2
   Two      0
dtype: int64
like image 40
piRSquared Avatar answered Jan 04 '23 23:01

piRSquared