Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate quantiles in a pandas multiindex DataFrame?

Tags:

python

pandas

I have a pandas multiindex DataFrame, and I want to calculate the quantiles of its values over a specific index level. It's better to explain with an example.

First, let's create the DataFrame:

import itertools
import pandas as pd
import numpy as np

item = ('A', 'B')
item_type = (0, 1, 2)
location = range(5)
idx = pd.MultiIndex.from_tuples(list(itertools.product(item, item_type, location)),names=('Item', 'Type', 'Location'))
df = pd.DataFrame(np.random.randn(len(idx), 3), index=idx,columns=('C1', 'C2', 'C3'))
df

Let's say we want to calculate a table of the median of the column values for each Item and Type over all locations. This is easy enough to do with the builtin .median method:

median_df = df.median(level=[0,1])
median_df

This will produce a three-column DataFrame with multiindex=(Item, Type). It works for most common functions like .mean, .max, .min, etc.

But it does not work for .quantile--strangely enough, quantile does not have the 'level' parameter.

How can I calculate a given quantile in the same way as I did for the median, etc?

like image 826
germ Avatar asked Apr 06 '16 19:04

germ


People also ask

How is quantile calculated in pandas?

Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.

How do you get Quantiles in Python?

In Python, the numpy. quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.


1 Answers

Apply the quantile function by first grouping by your multiindex levels:

df.groupby(level=[0,1]).quantile()

The same result will work for the median function, so the following line is equivalent to your code df.median(level=[0,1]):

df.groupby(level=[0,1]).median()

Against the GroupBy object returned by groupby function, you have also the agg function that allows you to bulk several function calls at once, and the resulting dataframe will have several levels for the columns:

df.groupby(level=[0,1]).agg(['median', 'quantile'])
like image 64
Zeugma Avatar answered Sep 28 '22 03:09

Zeugma