I have a dataframe as follows:
user num1 num2
a 1 1
a 2 2
a 3 3
b 4 4
b 5 5
I want a dataframe which has the minimum from num1 for each user, and the maximum of num2 for each user.
The output should be like:
user num1 num2
a 1 3
b 4 5
I know that if I wanted the max of both columns I could just do:
a.groupby('user')['num1', 'num2'].max()
Is there some equivalent without having to do something like:
series_1 = a.groupby('user')['num1'].min()
series_2 = a.groupby('user')['num2'].max()
# converting from series to df so I can do a join on user
df_1 = pd.DataFrame(np.array([series_1]).transpose(), index=series_1.index, columns=['num1'])
df_2 = pd.DataFrame(np.array([series_2]).transpose(), index=series_2.index, columns=['num2'])
df_1.join(df_2)
Pandas Groupby Maximum The following is a step-by-step guide of what you need to do. Group the dataframe on the column(s) you want. Select the field(s) for which you want to estimate the maximum. Apply the pandas max() function directly or pass 'max' to the agg() function.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Use min() function on a series to find the minimum value in the series. b) Get row index label or position of minimum values among rows and columns : Dataframe. idxmin() : This function returns index of first occurrence of minimum over requested axis.
Use groupby
+ agg
by dict
, so then is necessary order columns by subset
or reindex_axis
. Last add reset_index
for convert index
to column
if necessary.
df = a.groupby('user').agg({'num1':'min', 'num2':'max'})[['num1','num2']].reset_index()
print (df)
user num1 num2
0 a 1 3
1 b 4 5
What is same as:
df = a.groupby('user').agg({'num1':'min', 'num2':'max'})
.reindex_axis(['num1','num2'], axis=1)
.reset_index()
print (df)
user num1 num2
0 a 1 3
1 b 4 5
I would like to add to @jezrael's answer if anyone wants to get the first and last values for specific columns, it can be done in the same way as:
df.groupby(['user']).agg({'num1':'min', 'num2':'max', 'num3':'first', 'num4':'last', 'num5':'sum'})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With