Multiple indexing with multiple idxmin() and idmax() in one aggregate in pandas

Question

In R data.table it is possible and easy to aggregate on multiple columns using argmin or argmax functions in one aggregate. For example for DT:

> DT = data.table(id=c(1,1,1,2,2,2,2,3,3,3), col1=c(1,3,5,2,5,3,6,3,67,7), col2=c(4,6,8,3,65,3,5,4,4,7), col3=c(34,64,53,5,6,2,4,6,4,67))
> DT
    id col1 col2 col3
 1:  1    1    4   34
 2:  1    3    6   64
 3:  1    5    8   53
 4:  2    2    3    5
 5:  2    5   65    6
 6:  2    3    3    2
 7:  2    6    5    4
 8:  3    3    4    6
 9:  3   67    4    4
10:  3    7    7   67

> DT_agg = DT[, .(agg1 = col1[which.max(col2)]
                , agg2 = col2[which.min(col3)]
                , agg3 = col1[which.max(col3)])
              , by= id]
> DT_agg
   id agg1 agg2 agg3
1:  1    5    4    3
2:  2    5    3    5
3:  3    7    4    7

agg1 is value of col1 where value of col2 is maximum, grouped by id.

agg2 is value of col2 where value of col3 is minimum, grouped by id.

agg3 is value of col1 where value of col3 is maximum, grouped by id.

how is this possible in Pandas, doing all three aggregates in one aggregate operation using groupby and agg? I can't figure out how to incorporate three different indexing in one agg function in Python. here's the dataframe in Python:

DF =pd.DataFrame({'id':[1,1,1,2,2,2,2,3,3,3], 'col1':[1,3,5,2,5,3,6,3,67,7], 'col2':[4,6,8,3,65,3,5,4,4,7], 'col3':[34,64,53,5,6,2,4,6,4,67]})

DF
Out[70]: 
   id  col1  col2  col3
0   1     1     4    34
1   1     3     6    64
2   1     5     8    53
3   2     2     3     5
4   2     5    65     6
5   2     3     3     2
6   2     6     5     4
7   3     3     4     6
8   3    67     4     4
9   3     7     7    67

Ch3steR · Accepted Answer

You can try this,

DF.groupby('id').agg(agg1=('col1',lambda x:x[DF.loc[x.index,'col2'].idxmax()]),
                     agg2 = ('col2',lambda x:x[DF.loc[x.index,'col3'].idxmin()]),
                     agg3 = ('col1',lambda x:x[DF.loc[x.index,'col3'].idxmax()]))

    agg1  agg2  agg3
id
1      5     4     3
2      5     3     5
3      7     4     7

Panwen Wang · Answer

How about a tidyverse way in python:

>>> from datar.all import f, tibble, group_by, which_max, which_min, summarise
>>> 
>>> DF = tibble(
...     id=[1,1,1,2,2,2,2,3,3,3], 
...     col1=[1,3,5,2,5,3,6,3,67,7],
...     col2=[4,6,8,3,65,3,5,4,4,7], 
...     col3=[34,64,53,5,6,2,4,6,4,67]
... )
>>> 
>>> DF >> group_by(f.id) >> summarise(
...     agg1=f.col1[which_max(f.col2)],
...     agg2=f.col2[which_min(f.col3)],
...     agg3=f.col1[which_max(f.col3)]
... )
       id    agg1    agg2    agg3
  <int64> <int64> <int64> <int64>
0       1       5       4       3
1       2       5       3       5
2       3       7       4       7

I am the author of the datar package. Feel free to submit issues if you have any questions.

Multiple indexing with multiple idxmin() and idmax() in one aggregate in pandas

Tags:

python

python-3.x

pandas

r

data.table

Ankhnesmerira

2 Answers

Ch3steR

Panwen Wang

Recent Activity

Donate For Us

Multiple indexing with multiple idxmin() and idmax() in one aggregate in pandas

Tags:

python

python-3.x

pandas

r

data.table

Ankhnesmerira

2 Answers

Ch3steR

Panwen Wang

Related questions

Recent Activity

Donate For Us