I understand how to create simple quantiles in Pandas using pd.qcut. But after searching around, I don't see anything to create weighted quantiles. Specifically, I wish to create a variable which bins the values of a variable of interest (from smallest to largest) such that each bin contains an equal weight. So far this is what I have:
def wtdQuantile(dataframe, var, weight = None, n = 10):
    if weight == None:
        return pd.qcut(dataframe[var], n, labels = False)
    else:
        dataframe.sort_values(var, ascending = True, inplace = True)
        cum_sum = dataframe[weight].cumsum()
        cutoff = max(cum_sum)/n
        quantile = cum_sum/cutoff
        quantile[-1:] -= 1
        return quantile.map(int)
Is there an easier way, or something prebuilt from Pandas that I'm missing?
Edit: As requested, I'm providing some sample data. In the following, I'm trying to bin the "Var" variable using "Weight" as the weight. Using pd.qcut, we get an equal number of observations in each bin. Instead, I want an equal weight in each bin, or in this case, as close to equal as possible.
Weight  Var  pd.qcut(n=5)  Desired_Rslt
   10     1            0              0
   14     2            0              0
   18     3            1              0
   15     4            1              1
   30     5            2              1
   12     6            2              2
   20     7            3              2
   25     8            3              3
   29     9            4              3
   45    10            4              4
                Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.
In Python, the numpy. quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile.
To find percentiles of a numeric column in a DataFrame, or the percentiles of a Series in pandas, the easiest way is to use the pandas quantile() function. You can also use the numpy percentile() function.
I don't think this is built-in to Pandas, but here is a function that does what you want in a few lines:
import numpy as np
import pandas as pd
from pandas._libs.lib import is_integer
def weighted_qcut(values, weights, q, **kwargs):
    'Return weighted quantile cuts from a given series, values.'
    if is_integer(q):
        quantiles = np.linspace(0, 1, q + 1)
    else:
        quantiles = q
    order = weights.iloc[values.argsort()].cumsum()
    bins = pd.cut(order / order.iloc[-1], quantiles, **kwargs)
    return bins.sort_index()
We can test it on your data this way:
data = pd.DataFrame({
    'var': range(1, 11),
    'weight': [10, 14, 18, 15, 30, 12, 20, 25, 29, 45]
})
data['qcut'] = pd.qcut(data['var'], 5, labels=False)
data['weighted_qcut'] = weighted_qcut(data['var'], data['weight'], 5, labels=False)
print(data)
The output matches your desired result from above:
   var  weight  qcut  weighted_qcut
0    1      10     0              0
1    2      14     0              0
2    3      18     1              0
3    4      15     1              1
4    5      30     2              1
5    6      12     2              2
6    7      20     3              2
7    8      25     3              3
8    9      29     4              3
9   10      45     4              4
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With