Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping and Transforming in pandas

My data have the following structure:

       Name     Value       id
0    Alegro  0.850122   alegro
1    Alegro  0.447362   alegro
2    AlEgro  0.711295   alegro
3    ALEGRO  0.123761   alegro
4    alegRo  0.273111   alegro
5    ALEGRO  0.564893   alegro
6    ALEGRO  0.276369   alegro
7    ALEGRO  0.526434   alegro
8    ALEGRO  0.924014   alegro
9    ALEGrO  0.629207   alegro
10  Belagio  0.834231  belagio
11  BElagio  0.788357  belagio
12  Belagio  0.092156  belagio
13  BeLaGio  0.810275  belagio

To replicate run the code below:

data = {'Name': ['Alegro', 'Alegro', 'AlEgro', 'ALEGRO', 'alegRo', 'ALEGRO','ALEGRO',
                 'ALEGRO','ALEGRO','ALEGrO', 'Belagio','BElagio', 'Belagio', 'BeLaGio'],
     'Value': np.random.random(14)}

df = pd.DataFrame(data)

df['id'] = df.Name.str.lower()

You can see that there are some typos im dataset.

df.groupby('id').Name.value_counts()

id       Name   
alegro   ALEGRO     5
         Alegro     2
         ALEGrO     1
         AlEgro     1
         alegRo     1
belagio  Belagio    2
         BElagio    1
         BeLaGio    1

So the aim is to take the most frequent value from each category and set it as New name. For the first group it would be ALEGRO and for second Belagio.

The desired data frame should be:

       Name     Value       id
0    ALEGRO  0.850122   alegro
1    ALEGRO  0.447362   alegro
2    ALEGRO  0.711295   alegro
3    ALEGRO  0.123761   alegro
4    ALEGRO  0.273111   alegro
5    ALEGRO  0.564893   alegro
6    ALEGRO  0.276369   alegro
7    ALEGRO  0.526434   alegro
8    ALEGRO  0.924014   alegro
9    ALEGRO  0.629207   alegro
10  Belagio  0.834231  belagio
11  Belagio  0.788357  belagio
12  Belagio  0.092156  belagio
13  Belagio  0.810275  belagio

Any idea would be highly appreciated!

like image 277
And_R Avatar asked Jan 30 '19 15:01

And_R


People also ask

What is transformation in pandas?

Pandas Series: transform() function The transform() function is used to call function on self producing a Series with transformed values and that has the same axis length as self.

What does Groupby transform do?

transform. Call function producing a same-indexed DataFrame on each group. Returns a DataFrame having the same indexes as the original object filled with the transformed values.

What is the difference between transform and apply pandas?

transform() can take a function, a string function, a list of functions, and a dict. However, apply() is only allowed a function. apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.

How do you group values in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.


1 Answers

Use GroupBy.transform for return Series with same size like original DataFrame, so possible create new column.

df['New'] = df.groupby('id').Name.transform(lambda x: x.value_counts().index[0])

Another solution:

df['New'] = df.groupby('id').Name.transform(lambda x: x.mode().iat[0])

print (df)
       Name     Value       id      New
0    Alegro  0.850122   alegro   ALEGRO
1    Alegro  0.447362   alegro   ALEGRO
2    AlEgro  0.711295   alegro   ALEGRO
3    ALEGRO  0.123761   alegro   ALEGRO
4    alegRo  0.273111   alegro   ALEGRO
5    ALEGRO  0.564893   alegro   ALEGRO
6    ALEGRO  0.276369   alegro   ALEGRO
7    ALEGRO  0.526434   alegro   ALEGRO
8    ALEGRO  0.924014   alegro   ALEGRO
9    ALEGrO  0.629207   alegro   ALEGRO
10  Belagio  0.834231  belagio  Belagio
11  BElagio  0.788357  belagio  Belagio
12  Belagio  0.092156  belagio  Belagio
13  BeLaGio  0.810275  belagio  Belagio
like image 170
jezrael Avatar answered Sep 19 '22 20:09

jezrael