I have following dataset
Item Count
A 60
A 20
A 21
B 33
B 33
B 32
Code to reproduce:
import pandas as pd
df = pd.DataFrame([
['A', 60],
['A', 20],
['A', 21],
['B', 33],
['B', 33],
['B', 32],
],
columns=['Item', 'Count'])
Suppose I have to Change only the maximum value of each group of "Item" column by adding 1.
the output should be like this:
Item Count New_Count
A 60 61
A 20 20
A 21 21
B 33 34
B 33 34
B 32 32
I tried df['New_Count']=df.groupby(['Item'])['Count'].transform(lambda x: max(x)+1)
but all the values in "Count" was replaced by max value of each group +1.
Item Count New_Count
A 60 61
A 20 61
A 21 61
B 33 34
B 33 34
B 32 34
To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.
Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.
In the pandas series constructor, there is a method called argmax() which is used to get the position of maximum value over the series data. The pandas series is a single-dimensional data structure object with row index values. By using row index values we can access the data.
Use idxmax
:
idx = df.groupby("Item")["Count"].idxmax()
df["New_Count"] = df["Count"]
df.loc[idx, "New_Count"] += 1
This will only increment the first occurrence of th maximum in each group.
If you want to increment all the maximum values in the case of a tie, you can use transform
instead. Just replace the first line above with:
idx = df.groupby("Item")["Count"].transform(max) == df["Count"]
You can use idxmax()
to get the idx of the maximum for each group, and increment only these items, like this:
max_idxs = df.groupby(['Item'])['Count'].idxmax()
df['New_Count']=df['Count'] # copy entire column
df['New_Count'][max_idxs]+=1 # increment only the maximum item for each group by 1
Here's another way not using groupby but using duplicated
df.loc[~df.sort_values('Count', ascending=False).duplicated('Item'), 'Count'] += 1
Output:
Item Count
0 A 61
1 A 20
2 A 21
3 B 34
4 B 33
5 B 32
to change the value in all the maximum values that are repeated you will need .groupby()
, .join()
and np.where()
df = pd.DataFrame([
['A', 60],
['A', 60],
['A', 20],
['A', 21],
['B', 21],
['B', 33],
['B', 34],
], columns=['Item', 'Count'])
s = df.groupby('Item')['Count'].max().rename('newCount')
df = df.set_index('Item').join(s).reset_index()
df['newCount'] = np.where(df['Count'] != df['newCount'], df['Count'], (df['newCount'] + 1))
df.head(10)
#output
Item Count newCount
0 A 60 61
1 A 60 61
2 A 20 20
3 A 21 21
4 B 21 21
5 B 33 33
6 B 34 35
We can replace the .join()
with a .transform()
as suggested by @Dan
df['newCount'] = df.groupby('Item')['Count'].transform('max')
df['newCount'] = np.where(df['Count'] != df['newCount'], df['Count'], (df['newCount'] + 1))
#output
Item Count newCount
0 A 60 61
1 A 60 61
2 A 20 20
3 A 21 21
4 B 21 21
5 B 33 33
6 B 34 35
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With