Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas to Find Minimum Values of Grouped Rows

This might be a trivial question but I'm still trying to figure out pandas/numpy.

So, suppose I have a table with the following structure:

group_id | col1 | col2 | col3 |  "A"   |  "B"
   x     |   1  |   2  |  3   |  NaN   |   1
   x     |   3  |   2  |  3   |   1    |   1 
   x     |   4  |   2  |  3   |   2    |   1
   y     |   1  |   2  |  3   |  NaN   |   3 
   y     |   3  |   2  |  3   |   3    |   3 
   z     |   3  |   2  |  3   |   10   |   2
   z     |   2  |   2  |  3   |   6    |   2
   z     |   4  |   2  |  3   |   4    |   2
   z     |   4  |   2  |  3   |   2    |   2

Note that there is a group_id that groups elements in each row. So at the beginning, I have the values for columns group_id and col1-col3.

Then for each row, if col1, col2, or col3 have value = 1, then "A" is NaN, otherwise the value is based on a formula (irrelevant for here so I put some numbers in place).

That, I know how to do using:

df["A"] = np.where(((df['col1'] == 1)|(df['col2']== 1) | (df['col3']) == 1))), NaN, value)

But for column "B", I need to fill it in with the minimum of values from column A for a specific group.

So for example, "B" is equal to "1" for all rows with group X because the minimum value in column A for all of the group "x" rows is equal to 1.

Similarly, for rows in group "y", the minimum value is 3, and for group "z" the minimum value is 2. How exactly do I do that using pandas...? It's confusing me a little more because the number of rows for a specific group can be of varying size.

If they were all the same size I could just say fill it with the minimum of values in a pre-set range.

I hope that made sense; please let me know if I should provide a clearer example or clarify anything!

like image 938
shishy Avatar asked Jan 03 '17 19:01

shishy


3 Answers

To get the minimum of column A for each group use transform

df.groupby('group_id')['A'].transform('min')
like image 170
Ted Petrou Avatar answered Nov 05 '22 20:11

Ted Petrou


  • focus on just ['col1', 'col2', 'col3']
  • see if they are equal to 1 with eq(1) equivalent to == 1
  • see if any are equal to one along axis=1 with any(1)
  • use loc to make assignment

anyone = df[['col1', 'col2', 'col3']].eq(1).any(1)
df.loc[anyone, 'A'] = np.nan

numpy equivalent

anyone = (df[['col1', 'col2', 'col3']].values == 1).any(1)
df.A = np.where(anyone, np.nan, df.A)
like image 39
piRSquared Avatar answered Nov 05 '22 20:11

piRSquared


df.groupby('group_id')['A'].min()
like image 25
Zhaoyun Ma Avatar answered Nov 05 '22 19:11

Zhaoyun Ma