I have the following dataset (<code>df</code>). I want to <code>groupby</code> it using brand as my index, get the mean of workers and value columns and the first count of provider column. <pre class="prettyprint"><code>brand workers value provider H&M 322 56 mark H&M 450 433 mark Lindex 678 233 luke Lindex 543 456 luke Levi 234 32 chris Levi 789 12 chris </code></pre> Now I can <pre class="prettyprint"><code>df = df.groupby('brand')['workers', 'value', 'provider'].agg({'workers': mean, 'value':mean, 'provider' : first).reset_index() </code></pre> but consider that my real dataset as way more columns I want to take the mean and I don't want to specify each of them, is there a better way of declaring a default function? Sort of "take the mean of all the non string columns and the first observation of the string columns?"

No, but it isn't that hard to write some code to do it for you. <pre class="prettyprint"><code>f = dict.fromkeys(df, 'mean') f.update( dict.fromkeys(df.columns[df.dtypes.eq(object)], 'first')) </code></pre> <pre class="prettyprint"><code>print(f) {'brand': 'first', 'provider': 'first', 'value': 'mean', 'workers': 'mean'} </code></pre> You then pass <code>f</code> to <code>agg</code>. <pre class="prettyprint"><code>df = df.groupby('brand')['workers', 'value', 'provider'].agg(f) </code></pre> If you want to reset the index, you will have to remove the grouper from <code>f</code>. <pre class="prettyprint"><code>del f['brand'] df = df.groupby('brand', as_index=False)['workers', 'value', 'provider'].agg(f) </code></pre>

pandas aggregate function in groupby - default option?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

I have the following dataset (df). I want to groupby it using brand as my index, get the mean of workers and value columns and the first count of provider column.

Click to copy

brand   workers value   provider
H&M      322    56         mark
H&M      450    433        mark
Lindex  678     233        luke
Lindex  543     456        luke
Levi    234     32         chris
Levi    789     12         chris

Now I can

Click to copy

df = df.groupby('brand')['workers', 'value', 'provider'].agg({'workers': mean,  'value':mean, 'provider' : first).reset_index()

but consider that my real dataset as way more columns I want to take the mean and I don't want to specify each of them, is there a better way of declaring a default function?

Sort of "take the mean of all the non string columns and the first observation of the string columns?"

221

asked Jul 09 '18 03:07

Filippo Sebastio

1 Answers

No, but it isn't that hard to write some code to do it for you.

Click to copy

f = dict.fromkeys(df, 'mean')
f.update(
    dict.fromkeys(df.columns[df.dtypes.eq(object)], 'first'))

Click to copy

print(f)
{'brand': 'first', 'provider': 'first', 'value': 'mean', 'workers': 'mean'}

You then pass f to agg.

Click to copy

df = df.groupby('brand')['workers', 'value', 'provider'].agg(f)

If you want to reset the index, you will have to remove the grouper from f.

Click to copy

del f['brand']
df = df.groupby('brand', as_index=False)['workers', 'value', 'provider'].agg(f)

121

answered Oct 05 '22 23:10

cs95

Related questions
                            
                                Parsing Google Scholar results with Python and BeautifulSoup
                            
                                Build matrices from block matrices in SymPy
                            
                                How can you remove superset lists from a list of lists in Python?
                            
                                Implementing a decision tree using h2o
                            
                                How to cycle list in python? [duplicate]
                            
                                Multiple Embedding layers for Keras Sequential model
                            
                                Bad Request error while querying data from bigquery in a loop
                            
                                Sum product and groupby
                            
                                Type error while comparing two dictionaries [duplicate]
                            
                                Class methods in python: default kwargs from attributes of self
                            
                                to show video streaming inside frame in tkinter
                            
                                Difference between `Series.str.contains("|")` and `Series.apply(lambda x:"|" in x)` in pandas?
                            
                                getting last element of a list of unknown size using slices
                            
                                pyodbc- connection failure to SQL Server
                            
                                Replace empty dicts in nested dicts
                            
                                Flask-Migrate not detecting tables
                            
                                Simple Linear Regression using Keras
                            
                                Pandas group-by date range & different calculations on multiple columns
                            
                                Plotting a legend for facet grids
                            
                                Drop rows after maximum value in a grouped Pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas aggregate function in groupby - default option?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Filippo Sebastio

People also ask

1 Answers

cs95

Recent Activity

Donate For Us