I have a pandas multiindex DataFrame, and I want to calculate the quantiles of its values over a specific index level. It's better to explain with an example.
First, let's create the DataFrame:
import itertools
import pandas as pd
import numpy as np
item = ('A', 'B')
item_type = (0, 1, 2)
location = range(5)
idx = pd.MultiIndex.from_tuples(list(itertools.product(item, item_type, location)),names=('Item', 'Type', 'Location'))
df = pd.DataFrame(np.random.randn(len(idx), 3), index=idx,columns=('C1', 'C2', 'C3'))
df
Let's say we want to calculate a table of the median of the column values for each Item and Type over all locations. This is easy enough to do with the builtin .median method:
median_df = df.median(level=[0,1])
median_df
This will produce a three-column DataFrame with multiindex=(Item, Type). It works for most common functions like .mean, .max, .min, etc.
But it does not work for .quantile--strangely enough, quantile does not have the 'level' parameter.
How can I calculate a given quantile in the same way as I did for the median, etc?
Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.
In Python, the numpy. quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile.
from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.
Apply the quantile
function by first grouping by your multiindex levels:
df.groupby(level=[0,1]).quantile()
The same result will work for the median
function, so the following line is equivalent to your code df.median(level=[0,1])
:
df.groupby(level=[0,1]).median()
Against the GroupBy
object returned by groupby
function, you have also the agg
function that allows you to bulk several function calls at once, and the resulting dataframe will have several levels for the columns:
df.groupby(level=[0,1]).agg(['median', 'quantile'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With