Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom describe or aggregate without groupby

Tags:

python

pandas

I want to use groupby.agg where my group is the entire dataframe. Put another way, I want to use the agg functionality, without the groupby. I've looked for an example of this, but can not find it.

Here's what I've done:

import pandas as pd
import numpy as np

np.random.seed([3,1415])

df = pd.DataFrame(np.random.rand(6, 4), columns=list('ABCD'))
df

df

def describe(df):
    funcs = dict(Kurt=lambda x: x.kurt(),
                 Skew='skew',
                 Mean='mean',
                 Std='std')
    one_group = [True for _ in df.index]
    funcs_for_all = {k: funcs for k in df.columns}
    return df.groupby(one_group).agg(funcs_for_all).iloc[0].unstack().T

describe(df)

enter image description here

Question

How was I supposed to have done this?

like image 954
piRSquared Avatar asked Jul 04 '16 07:07

piRSquared


People also ask

Can I use GROUP BY without aggregate function Python?

Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately.

Can we use GROUP BY without aggregate function in Pyspark?

At best you can use . first , . last to get respective values from the groupBy but not all in the way you can get in pandas. Since their is a basic difference between the way the data is handled in pandas and spark not all functionalities can be used in the same way.

How do you use the AGG function in Python?

agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method. Parameters: func: Function, list of function or string of function name to be called on Series.

What is AGG in GROUP BY?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.


1 Answers

A small compaction of your own proposal, which I think improves readability, by exploiting that DataFrame.groupby() accepts a lambda function:

def describe(df):
    funcs = dict(Kurt=lambda x: x.kurt(),
                 Skew='skew',
                 Mean='mean',
                 Std='std')
    funcs_for_all = {k: funcs for k in df.columns}
    return df.groupby(lambda _ : True).agg(funcs_for_all).iloc[0].unstack().T

describe(df)
like image 165
kidmose Avatar answered Oct 22 '22 15:10

kidmose