Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a Decile and Quintile columns to rank another variable based on size using Python, Pandas?

I have a data frame with a column containing Investment which represents the amount invested by a trader. I would like to create 2 new columns in the data frame; one giving a decile rank and the other a quintile rank based on the Investment size. I want 1 to represent the decile with the largest Investments and 10 representing the smallest. Smilarly, I want 1 to represent the quintile with the largest investments and 5 representing the smallest.

I am new to Pandas, so is there a way that I can easily do this? Thanks!

like image 906
finstats Avatar asked Oct 21 '14 21:10

finstats


People also ask

How do you create a decile in Pyspark?

Decile Rank of the column by group in pyspark Decile rank of the column by group is calculated by passing argument 10 to ntile() function. we will be using partitionBy() on “Item_group”, orderBy() on “price” column.

What is quantile function in Python?

In Python, the numpy. quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile. For example, numpy. quantile(data, 0.25) returns the value at the first quartile of the dataset data .


1 Answers

The functionality you're looking for is in pandas.qcut http://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html

In [51]: import numpy as np

In [52]: import pandas as pd

In [53]: investment_df = pd.DataFrame(np.arange(10), columns=['investment'])

In [54]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=False)

In [55]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=False)

In [56]: investment_df
Out[56]: 
   investment  decile  quintile
0           0       0         0
1           1       1         0
2           2       2         1
3           3       3         1
4           4       4         2
5           5       5         2
6           6       6         3
7           7       7         3
8           8       8         4
9           9       9         4   

It's nonstandard to label the largest percentile with the smallest number but you can do this by

In [60]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=np.arange(5, 0, -1))

In [61]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=np.arange(10, 0, -1))

In [62]: investment_df
Out[62]: 
   investment decile quintile
0           0     10        5
1           1      9        5
2           2      8        4
3           3      7        4
4           4      6        3
5           5      5        3
6           6      4        2
7           7      3        2
8           8      2        1
9           9      1        1
like image 106
Dan Frank Avatar answered Oct 10 '22 20:10

Dan Frank