Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas aggregate function in groupby - default option?

I have the following dataset (df). I want to groupby it using brand as my index, get the mean of workers and value columns and the first count of provider column.

brand   workers value   provider
H&M      322    56         mark
H&M      450    433        mark
Lindex  678     233        luke
Lindex  543     456        luke
Levi    234     32         chris
Levi    789     12         chris

Now I can

df = df.groupby('brand')['workers', 'value', 'provider'].agg({'workers': mean,  'value':mean, 'provider' : first).reset_index()

but consider that my real dataset as way more columns I want to take the mean and I don't want to specify each of them, is there a better way of declaring a default function?

Sort of "take the mean of all the non string columns and the first observation of the string columns?"

like image 221
Filippo Sebastio Avatar asked Jul 09 '18 03:07

Filippo Sebastio


People also ask

Can we use group by with aggregate function pandas?

In pandas, the groupby function can be combined with one or more aggregation functions to quickly and easily summarize data. This concept is deceptively simple and most new pandas users will understand this concept.

What is AGG in groupby pandas?

Aggregate using one or more operations over the specified axis. Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.

Can I use group by without aggregate function pandas?

Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately.

How does AGG work in pandas?

Pandas DataFrame agg() Method The agg() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.


1 Answers

No, but it isn't that hard to write some code to do it for you.

f = dict.fromkeys(df, 'mean')
f.update(
    dict.fromkeys(df.columns[df.dtypes.eq(object)], 'first'))

print(f)
{'brand': 'first', 'provider': 'first', 'value': 'mean', 'workers': 'mean'}

You then pass f to agg.

df = df.groupby('brand')['workers', 'value', 'provider'].agg(f)

If you want to reset the index, you will have to remove the grouper from f.

del f['brand']
df = df.groupby('brand', as_index=False)['workers', 'value', 'provider'].agg(f)
like image 121
cs95 Avatar answered Oct 05 '22 23:10

cs95