Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby where you get the max of one column and the min of another column

I have a dataframe as follows:

user    num1    num2
a       1       1
a       2       2
a       3       3
b       4       4
b       5       5

I want a dataframe which has the minimum from num1 for each user, and the maximum of num2 for each user.

The output should be like:

user    num1    num2
a       1       3
b       4       5

I know that if I wanted the max of both columns I could just do:

a.groupby('user')['num1', 'num2'].max()

Is there some equivalent without having to do something like:

series_1 = a.groupby('user')['num1'].min() 
series_2 = a.groupby('user')['num2'].max()

# converting from series to df so I can do a join on user
df_1 = pd.DataFrame(np.array([series_1]).transpose(), index=series_1.index, columns=['num1']) 
df_2 = pd.DataFrame(np.array([series_2]).transpose(), index=series_2.index, columns=['num2'])

df_1.join(df_2)
like image 916
lhay86 Avatar asked Jun 06 '17 06:06

lhay86


People also ask

How do you use Groupby and Max in pandas?

Pandas Groupby Maximum The following is a step-by-step guide of what you need to do. Group the dataframe on the column(s) you want. Select the field(s) for which you want to estimate the maximum. Apply the pandas max() function directly or pass 'max' to the agg() function.

Can you use Groupby with multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

How do I get the minimum of two columns in pandas?

Use min() function on a series to find the minimum value in the series. b) Get row index label or position of minimum values among rows and columns : Dataframe. idxmin() : This function returns index of first occurrence of minimum over requested axis.


2 Answers

Use groupby + agg by dict, so then is necessary order columns by subset or reindex_axis. Last add reset_index for convert index to column if necessary.

df = a.groupby('user').agg({'num1':'min', 'num2':'max'})[['num1','num2']].reset_index()
print (df)
  user  num1  num2
0    a     1     3
1    b     4     5

What is same as:

df = a.groupby('user').agg({'num1':'min', 'num2':'max'})
                      .reindex_axis(['num1','num2'], axis=1)
                      .reset_index()
print (df)
  user  num1  num2
0    a     1     3
1    b     4     5
like image 60
jezrael Avatar answered Oct 06 '22 23:10

jezrael


I would like to add to @jezrael's answer if anyone wants to get the first and last values for specific columns, it can be done in the same way as:

df.groupby(['user']).agg({'num1':'min', 'num2':'max', 'num3':'first', 'num4':'last', 'num5':'sum'})

like image 29
Mihir Thakur Avatar answered Oct 06 '22 23:10

Mihir Thakur