Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame: add column whose values are the quantile number/rank of an existing column?

I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column.

I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked value and to add this quantile number as a new colunm.

Example: if this is my DataFrame

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b'])

   a    b
0  1    1
1  2   10
2  3  100
3  4  100

and I'd like to know the quantile number (using 2 quantiles) of column b. I'd expect this result:

   a    b  quantile
0  1    1    1
1  2   10    1
2  3  100    2
3  4  100    2
like image 219
luca Avatar asked Jul 13 '16 15:07

luca


People also ask

How do you calculate quantile of a column in Python?

Pandas DataFrame quantile() Method The quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.

How do I add a rank in pandas?

To rank the rows of Pandas DataFrame we can use the DataFrame. rank() method which returns a rank of every respective index of a series passed. The rank is returned on the basis of position after sorting.

How do I make a decile in Python?

To place each data value into a decile, we can use the qcut pandas function. What is this? The way to interpret the output is as follows: The data value 56 falls between the percentile 0% and 10%, thus it falls in decile 0.


2 Answers

I discovered it is quite easy:

df['quantile'] = pd.qcut(df['b'], 2, labels=False)

   a    b  quantile
0  1    1         0
1  2   10         0
2  3  100         1
3  4  100         1

Interesting to know "difference between pandas.qcut and pandas.cut"

like image 179
luca Avatar answered Sep 30 '22 01:09

luca


df['quantile'] = pd.qcut(df['b'], 2, labels=False) seems to tend to throw a SettingWithCopyWarning.

The only general way I have found of doing this without complaints is like:

quantiles = pd.qcut(df['b'], 2, labels=False)
df = df.assign(quantile=quantiles.values)

This will assign the quantile rank values as a new DataFrame column df['quantile'].

A solution for a more generalized case, in which one wants to partition the cut by multiple columns, is given here.

like image 34
feetwet Avatar answered Sep 30 '22 00:09

feetwet