Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby quantile values

Tags:

python

pandas

I tried to calculate specific quantile values from a data frame, as shown in the code below. There was no problem when calculate it in separate lines.

When attempting to run last 2 lines, I get the following error:

AttributeError: 'SeriesGroupBy' object has no attribute 'quantile(0.25)'

How can I fix this?

import pandas as pd
df = pd.DataFrame(
    {
        'x': [0, 1, 0, 1, 0, 1, 0, 1],
        'y': [7, 6, 5, 4, 3, 2, 1, 0],
        'number': [25000, 35000, 45000, 50000, 60000, 70000, 65000, 36000]
    }
)
f = {'number': ['median', 'std', 'quantile']}
df1 = df.groupby('x').agg(f)
df.groupby('x').quantile(0.25)
df.groupby('x').quantile(0.75)

# code below with problem:
f = {'number': ['median', 'std', 'quantile(0.25)', 'quantile(0.75)']}
df1 = df.groupby('x').agg(f)
like image 909
lignin Avatar asked Dec 04 '17 16:12

lignin


2 Answers

I prefer def functions

def q1(x):
    return x.quantile(0.25)

def q3(x):
    return x.quantile(0.75)

f = {'number': ['median', 'std', q1, q3]}
df1 = df.groupby('x').agg(f)
df1
Out[1643]: 
  number                            
  median           std     q1     q3
x                                   
0  52500  17969.882211  40000  61250
1  43000  16337.584481  35750  55000
like image 67
BENY Avatar answered Sep 29 '22 11:09

BENY


@WeNYoBen's answer is great. There is one limitation though, and that lies with the fact that one needs to create a new function for every quantile. This can be a very unpythonic exercise if the number of quantiles become large. A better approach is to use a function to create a function, and to rename that function appropriately.

def rename(newname):
    def decorator(f):
        f.__name__ = newname
        return f
    return decorator

def q_at(y):
    @rename(f'q{y:0.2f}')
    def q(x):
        return x.quantile(y)
    return q

f = {'number': ['median', 'std', q_at(0.25) ,q_at(0.75)]}
df1 = df.groupby('x').agg(f)
df1

Out[]:
number                            
  median           std  q0.25  q0.75
x                                   
0  52500  17969.882211  40000  61250
1  43000  16337.584481  35750  55000

The rename decorator renames the function so that the pandas agg function can deal with the reuse of the quantile function returned (otherwise all quantiles results end up in columns that are named q).

like image 27
Jurgen Strydom Avatar answered Sep 29 '22 09:09

Jurgen Strydom