Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas : vectorized operations on maximum values per row

I have the following pandas dataframe df:

index        A    B    C
    1        1    2    3
    2        9    5    4
    3        7    12   8
    ...      ...  ...  ...

I want the maximum value of each row to remain unchanged, and all the other values to become -1. The output would thus look like this :

index        A    B    C
    1       -1   -1    3
    2        9   -1   -1
    3       -1    12  -1
    ...      ...  ...  ...

By using df.max(axis = 1), I get a pandas Series with the maximum values per row. However, I'm not sure how to use these maximums optimally to create the result I need. I'm looking for a vectorized, fast implementation.

like image 569
S Leon Avatar asked Mar 06 '16 21:03

S Leon


People also ask

Why is it better to use apply () to do vectorized operations on a DataFrame?

The apply function performs row-wise or column-wise operations by looping through the elements. The applymap function works in similar way but performs a given task on all the elements in the dataframe.

Is pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

Are pandas operations vectorized?

When you're processing data with Pandas, so-called “vectorized” operations can significantly speed up your code. Or at least, that's the theory. In practice, in some situations Pandas vectorized operations can actually make your code slower, or at least no faster.

How can I see maximum rows in pandas?

Method 2: Using set_option() display. max_rows represents the maximum number of rows that pandas will display while displaying a data frame. The default value of max_rows is 10. If set to 'None' then it means all rows of the data frame.


1 Answers

Consider using where:

>>> df.where(df.eq(df.max(1), 0), -1)
       A   B  C
index          
1     -1  -1  3
2      9  -1 -1
3     -1  12 -1

Here df.eq(df.max(1), 0) is a boolean DataFrame marking the row maximums; True values (the maximums) are left untouched whereas False values become -1. You can also use a Series or another DataFrame instead of a scalar if you like.

The operation can also be done inplace (by passing inplace=True).

like image 56
Alex Riley Avatar answered Nov 14 '22 22:11

Alex Riley