I have a dataframe as follows: <pre class="prettyprint"><code>user num1 num2 a 1 1 a 2 2 a 3 3 b 4 4 b 5 5 </code></pre> I want a dataframe which has the minimum from num1 for each user, and the maximum of num2 for each user. The output should be like: <pre class="prettyprint"><code>user num1 num2 a 1 3 b 4 5 </code></pre> I know that if I wanted the max of both columns I could just do: <pre class="prettyprint"><code>a.groupby('user')['num1', 'num2'].max() </code></pre> Is there some equivalent without having to do something like: <pre class="prettyprint"><code>series_1 = a.groupby('user')['num1'].min() series_2 = a.groupby('user')['num2'].max() # converting from series to df so I can do a join on user df_1 = pd.DataFrame(np.array([series_1]).transpose(), index=series_1.index, columns=['num1']) df_2 = pd.DataFrame(np.array([series_2]).transpose(), index=series_2.index, columns=['num2']) df_1.join(df_2) </code></pre>

Use <code>groupby</code> + <code>agg</code> by <code>dict</code>, so then is necessary order columns by <code>subset</code> or <code>reindex_axis</code>. Last add <code>reset_index</code> for convert <code>index</code> to <code>column</code> if necessary. <pre class="prettyprint"><code>df = a.groupby('user').agg({'num1':'min', 'num2':'max'})[['num1','num2']].reset_index() print (df) user num1 num2 0 a 1 3 1 b 4 5 </code></pre> What is same as: <pre class="prettyprint"><code>df = a.groupby('user').agg({'num1':'min', 'num2':'max'}) .reindex_axis(['num1','num2'], axis=1) .reset_index() print (df) user num1 num2 0 a 1 3 1 b 4 5 </code></pre>

pandas groupby where you get the max of one column and the min of another column

Tags:

python

pandas

pandas-groupby

I have a dataframe as follows:

user    num1    num2
a       1       1
a       2       2
a       3       3
b       4       4
b       5       5

I want a dataframe which has the minimum from num1 for each user, and the maximum of num2 for each user.

The output should be like:

user    num1    num2
a       1       3
b       4       5

I know that if I wanted the max of both columns I could just do:

a.groupby('user')['num1', 'num2'].max()

Is there some equivalent without having to do something like:

series_1 = a.groupby('user')['num1'].min() 
series_2 = a.groupby('user')['num2'].max()

# converting from series to df so I can do a join on user
df_1 = pd.DataFrame(np.array([series_1]).transpose(), index=series_1.index, columns=['num1']) 
df_2 = pd.DataFrame(np.array([series_2]).transpose(), index=series_2.index, columns=['num2'])

df_1.join(df_2)

916

asked Jun 06 '17 06:06

lhay86

2 Answers

Use groupby + agg by dict, so then is necessary order columns by subset or reindex_axis. Last add reset_index for convert index to column if necessary.

df = a.groupby('user').agg({'num1':'min', 'num2':'max'})[['num1','num2']].reset_index()
print (df)
  user  num1  num2
0    a     1     3
1    b     4     5

What is same as:

df = a.groupby('user').agg({'num1':'min', 'num2':'max'})
                      .reindex_axis(['num1','num2'], axis=1)
                      .reset_index()
print (df)
  user  num1  num2
0    a     1     3
1    b     4     5

answered Oct 06 '22 23:10

jezrael

I would like to add to @jezrael's answer if anyone wants to get the first and last values for specific columns, it can be done in the same way as:

df.groupby(['user']).agg({'num1':'min', 'num2':'max', 'num3':'first', 'num4':'last', 'num5':'sum'})

answered Oct 06 '22 23:10

Mihir Thakur

Related questions
                            
                                Python imports relative path
                            
                                How can I display an image using Pillow?
                            
                                Python 3 exception deletes variable in enclosing scope for unknown reason [duplicate]
                            
                                How to create ternary contour plot in Python?
                            
                                How can I keep test data after Django tests complete?
                            
                                Memory efficient sort of massive numpy array in Python
                            
                                What is the difference between skew and kurtosis functions in pandas vs. scipy?
                            
                                ValueError: setting an array element with a sequence. for Pandas
                            
                                Reorder levels of MultiIndex in a pandas DataFrame
                            
                                How to replace all values in a Pandas Dataframe not in a list? [duplicate]
                            
                                Using Boto3 to interact with amazon Aurora on RDS
                            
                                Average of a numpy array returns NaN
                            
                                overcome Graphdef cannot be larger than 2GB in tensorflow
                            
                                interpolate missing values 2d python
                            
                                How to remove the extra row (or column) after transpose() in Pandas
                            
                                Google Search Web Scraping with Python
                            
                                How can I slice each element of a numpy array of strings?
                            
                                Difference between '[:]' and '[::]' slicing when copying a list?
                            
                                No module named urllib3
                            
                                Python subprocess.call not waiting for process to finish blender

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With