Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby and aggregate using lambda functions

I am trying to groupby-aggregate a dataframe using lambda functions that are being created programatically. This so I can simulate a one-hot encoder of the categories present in a column.

Dataframe:

df = pd.DataFrame(np.array([[10, 'A'], [10, 'B'], [20, 'A'],[30,'B']]),
                   columns=['ID', 'category'])

ID category
10 A
10 B
20 A
30 B

Expected result:

ID A B
10 1 1
20 1 0
30 0 1

What I am trying:

one_hot_columns = ['A','B']
lambdas = [lambda x: 1 if x.eq(column).any() else 0 for column in one_hot_columns]
df_g = df.groupby('ID').category.agg(lambdas)

Result:

ID A B
10 1 1
20 0 0
30 1 1

But the above is not quite the expected result. Not sure what I am doing wrong. I know I could do this with get_dummies, but using lambdas is more convenient for automation. Also, I can ensure the order of the output columns.

like image 667
Pab Avatar asked Dec 03 '20 03:12

Pab


People also ask

How do you use Groupby and aggregate?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

What is Groupby AGG?

agg. Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.

What is the use of groupby () function in SQL Server?

Grouping is used to group data using some criteria from our dataset. It is used as split-apply-combine strategy. Splitting the data into groups based on some criteria. Applying a function to each group independently. Combining the results into a data structure. We use groupby () function to group the data on “Maths” value.

How to use aggregation functions with groupby in pandas?

It's possible in Pandas to define your own aggfunc and use it with a groupby method. In the next example we will define a function which will compute the NaN values in each group: Finally let's check how to use aggregation functions with groupby from scipy or numpy

How to use groupby and aggregate functions together in Python?

As you can see in these examples it is super easy and straight forward to use groupby and aggregate functions together. The rules are to use groupby function to create groupby object first and then call an aggregate function to compute information for each group.

What does groupby() do in a Dataframe?

As the name suggests it should group your data into groups. In this case, it will group it into three groups representing different flower species (our target values). As you can see the groupby () function returns a DataFrameGroupBy object. Not very useful at first glance.


2 Answers

Use crosstab:

pd.crosstab(df.ID, df['category']).reset_index()

Output:

category  ID  A  B
0         10  1  1
1         20  1  0
2         30  0  1
like image 196
Quang Hoang Avatar answered Oct 16 '22 10:10

Quang Hoang


You can use pd.get_dummies with Groupby.sum:

In [4331]: res = pd.get_dummies(df, columns=['category']).groupby('ID', as_index=False).sum()

In [4332]: res
Out[4332]: 
   ID  category_A  category_B
0  10           1           1
1  20           1           0
2  30           0           1

OR, use pd.concat with pd.get_dummies:

In [4329]: res = pd.concat([df, pd.get_dummies(df.category)], axis=1).groupby('ID', as_index=False).sum()

In [4330]: res
Out[4330]: 
   ID  A  B
0  10  1  1
1  20  1  0
2  30  0  1
like image 2
Mayank Porwal Avatar answered Oct 16 '22 10:10

Mayank Porwal