My data have the following structure:
Name Value id
0 Alegro 0.850122 alegro
1 Alegro 0.447362 alegro
2 AlEgro 0.711295 alegro
3 ALEGRO 0.123761 alegro
4 alegRo 0.273111 alegro
5 ALEGRO 0.564893 alegro
6 ALEGRO 0.276369 alegro
7 ALEGRO 0.526434 alegro
8 ALEGRO 0.924014 alegro
9 ALEGrO 0.629207 alegro
10 Belagio 0.834231 belagio
11 BElagio 0.788357 belagio
12 Belagio 0.092156 belagio
13 BeLaGio 0.810275 belagio
To replicate run the code below:
data = {'Name': ['Alegro', 'Alegro', 'AlEgro', 'ALEGRO', 'alegRo', 'ALEGRO','ALEGRO',
'ALEGRO','ALEGRO','ALEGrO', 'Belagio','BElagio', 'Belagio', 'BeLaGio'],
'Value': np.random.random(14)}
df = pd.DataFrame(data)
df['id'] = df.Name.str.lower()
You can see that there are some typos im dataset.
df.groupby('id').Name.value_counts()
id Name
alegro ALEGRO 5
Alegro 2
ALEGrO 1
AlEgro 1
alegRo 1
belagio Belagio 2
BElagio 1
BeLaGio 1
So the aim is to take the most frequent value from each category and set it as New name. For the first group it would be ALEGRO
and for second Belagio
.
The desired data frame should be:
Name Value id
0 ALEGRO 0.850122 alegro
1 ALEGRO 0.447362 alegro
2 ALEGRO 0.711295 alegro
3 ALEGRO 0.123761 alegro
4 ALEGRO 0.273111 alegro
5 ALEGRO 0.564893 alegro
6 ALEGRO 0.276369 alegro
7 ALEGRO 0.526434 alegro
8 ALEGRO 0.924014 alegro
9 ALEGRO 0.629207 alegro
10 Belagio 0.834231 belagio
11 Belagio 0.788357 belagio
12 Belagio 0.092156 belagio
13 Belagio 0.810275 belagio
Any idea would be highly appreciated!
Pandas Series: transform() function The transform() function is used to call function on self producing a Series with transformed values and that has the same axis length as self.
transform. Call function producing a same-indexed DataFrame on each group. Returns a DataFrame having the same indexes as the original object filled with the transformed values.
transform() can take a function, a string function, a list of functions, and a dict. However, apply() is only allowed a function. apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.
Use GroupBy.transform
for return Series
with same size like original DataFrame
, so possible create new column.
df['New'] = df.groupby('id').Name.transform(lambda x: x.value_counts().index[0])
Another solution:
df['New'] = df.groupby('id').Name.transform(lambda x: x.mode().iat[0])
print (df)
Name Value id New
0 Alegro 0.850122 alegro ALEGRO
1 Alegro 0.447362 alegro ALEGRO
2 AlEgro 0.711295 alegro ALEGRO
3 ALEGRO 0.123761 alegro ALEGRO
4 alegRo 0.273111 alegro ALEGRO
5 ALEGRO 0.564893 alegro ALEGRO
6 ALEGRO 0.276369 alegro ALEGRO
7 ALEGRO 0.526434 alegro ALEGRO
8 ALEGRO 0.924014 alegro ALEGRO
9 ALEGrO 0.629207 alegro ALEGRO
10 Belagio 0.834231 belagio Belagio
11 BElagio 0.788357 belagio Belagio
12 Belagio 0.092156 belagio Belagio
13 BeLaGio 0.810275 belagio Belagio
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With