Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to replace just first instance of max value in dataframe pandas?

I have a dataframe that look like this:

NAME   MONTH  TIME
Paul    Jan     3
Paul    Sept    1
Joe      Jan    3
Joe     Aug     3

And I transformed it to a df like this one, using pivot:

NAME JAN SEPT AUG 
Paul  3    1   0
Joe   3    0   3

Now I'm creating a new column with the biggest value for every row, and it looks like this:

NAME JAN SEPT AUG 1_MAX
Paul  3    1   0    3
Joe   3    0   3    3

And then, I'm assigning 0 in a temporary dataframe to the old biggest value, to now get the second biggest value, and look like this:

NAME JAN SEPT AUG 1_MAX 2_MAX
Paul  3    1   0    3     1
Joe   3    0   3    3     3

But because Joe's have 2 times 3, in Jan and August, when I assign 0 to the biggest one, who should just be 3 for JAN that is the first time the biggest value appear, it changes to 0 all max instances. It becomes like this, which is not what I want:

NAME JAN SEPT AUG 1_MAX 2_MAX
Paul  3    1   0    3     1
Joe   3    0   3    3     0

I'm using:

f_temp1 = df_temp1.apply(lambda x: x.replace(max(x), 0), axis = 1)

to change the biggest value to zero, but this replaces all the biggest values, I would like to replace the maximum value of the row just in the first time it appears.

I need a generic solution because I'm working in a big dataframe.

like image 906
kingjames23 Avatar asked Jan 17 '20 18:01

kingjames23


People also ask

How do you replace a single value in a DataFrame?

(3) Replace multiple values with multiple new values for an individual DataFrame column: df['column name'] = df['column name']. replace(['1st old value','2nd old value',...],['1st new value','2nd new value',...])

How can I replace multiple values with one value in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.

What does First () do in pandas?

Pandas Series: first() function The first() function (convenience method ) is used to subset initial periods of time series data based on a date offset. Keep labels from axis which are in items. in the dataset,and therefore data for 2019-02-13 was not returned.


2 Answers

Use numpy to sort the underlying array (assuming 'Name' is in the index) and join back the max values.

import pandas as pd
import numpy as np
N = 2

pd.concat([df, pd.DataFrame(np.sort(df.to_numpy(), axis=1)[:, -N:],
                            index=df.index,
                            columns=[f'{i}_MAX' for i in range(N, 0, -1)])],
           axis=1)

      JAN  SEPT  AUG  2_MAX  1_MAX
NAME                              
Paul    3     1    0      1      3
Joe     3     0    3      3      3
like image 177
ALollz Avatar answered Sep 28 '22 05:09

ALollz


Use:

df[['1_MAX','2_MAX']]=(df.loc[:,'JAN':]
                         .apply(lambda x: pd.Series(np.sort(np.unique(x))[-2:]),
                                axis=1)
                         .loc[:,[1,0]])
print(df)
   NAME  JAN  SEPT  AUG  1_MAX  2_MAX
0  Paul    3     1    0      3      1
1   Joe    3     0    3      3      0

Initial df

   NAME  JAN  SEPT  AUG
0  Paul    3     1    0
1   Joe    3     0    3
like image 28
ansev Avatar answered Sep 28 '22 04:09

ansev