I have a dataframe that look like this:
NAME MONTH TIME
Paul Jan 3
Paul Sept 1
Joe Jan 3
Joe Aug 3
And I transformed it to a df like this one, using pivot:
NAME JAN SEPT AUG
Paul 3 1 0
Joe 3 0 3
Now I'm creating a new column with the biggest value for every row, and it looks like this:
NAME JAN SEPT AUG 1_MAX
Paul 3 1 0 3
Joe 3 0 3 3
And then, I'm assigning 0 in a temporary dataframe to the old biggest value, to now get the second biggest value, and look like this:
NAME JAN SEPT AUG 1_MAX 2_MAX
Paul 3 1 0 3 1
Joe 3 0 3 3 3
But because Joe's have 2 times 3, in Jan and August, when I assign 0 to the biggest one, who should just be 3 for JAN that is the first time the biggest value appear, it changes to 0 all max instances. It becomes like this, which is not what I want:
NAME JAN SEPT AUG 1_MAX 2_MAX
Paul 3 1 0 3 1
Joe 3 0 3 3 0
I'm using:
f_temp1 = df_temp1.apply(lambda x: x.replace(max(x), 0), axis = 1)
to change the biggest value to zero, but this replaces all the biggest values, I would like to replace the maximum value of the row just in the first time it appears.
I need a generic solution because I'm working in a big dataframe.
(3) Replace multiple values with multiple new values for an individual DataFrame column: df['column name'] = df['column name']. replace(['1st old value','2nd old value',...],['1st new value','2nd new value',...])
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
Pandas Series: first() function The first() function (convenience method ) is used to subset initial periods of time series data based on a date offset. Keep labels from axis which are in items. in the dataset,and therefore data for 2019-02-13 was not returned.
Use numpy to sort
the underlying array (assuming 'Name'
is in the index) and join back the max values.
import pandas as pd
import numpy as np
N = 2
pd.concat([df, pd.DataFrame(np.sort(df.to_numpy(), axis=1)[:, -N:],
index=df.index,
columns=[f'{i}_MAX' for i in range(N, 0, -1)])],
axis=1)
JAN SEPT AUG 2_MAX 1_MAX
NAME
Paul 3 1 0 1 3
Joe 3 0 3 3 3
Use:
df[['1_MAX','2_MAX']]=(df.loc[:,'JAN':]
.apply(lambda x: pd.Series(np.sort(np.unique(x))[-2:]),
axis=1)
.loc[:,[1,0]])
print(df)
NAME JAN SEPT AUG 1_MAX 2_MAX
0 Paul 3 1 0 3 1
1 Joe 3 0 3 3 0
Initial df
NAME JAN SEPT AUG
0 Paul 3 1 0
1 Joe 3 0 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With