Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DENSE RANK

Tags:

python

sql

pandas

I'm dealing with pandas dataframe and have a frame like this:

Year Value  
2012  10
2013  20
2013  25
2014  30

I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:

    Year Value Rank
    2012  10    1
    2013  20    2
    2013  25    2
    2014  30    3

How can it be done in pandas?

Thanks!

like image 448
Keithx Avatar asked Sep 06 '16 21:09

Keithx


3 Answers

Use pd.Series.rank with method='dense'

df['Rank'] = df.Year.rank(method='dense').astype(int)

df

enter image description here

like image 112
piRSquared Avatar answered Oct 20 '22 08:10

piRSquared


The fastest solution is factorize:

df['Rank'] = pd.factorize(df.Year)[0] + 1

Timings:

#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)

In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop

In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop

In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 µs per loop
like image 26
jezrael Avatar answered Oct 20 '22 09:10

jezrael


You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).

df['Rank'] = df.Year.astype('category').cat.codes + 1

>>> df
   Year  Value  Rank
0  2012     10     1
1  2013     20     2
2  2013     25     2
3  2014     30     3
like image 5
Alexander Avatar answered Oct 20 '22 08:10

Alexander