Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get integers instead of floats from DataFrame's rank method

To substitute the numbers with their corresponding "ranks":

import pandas as pd
import numpy as np

numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,)) 
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank()

I am getting the float values as the default output type of rankmethod:

987     82.0
988     36.5
989    526.0
990    219.0
991    957.0
992    819.5
993    787.5
994    513.0

Instead of floats I would rather have the integers. Rounding the resulted float values using asType(int) would be risky since converting to int would probably introduce the duplicated values from the float values that are too close to each other such as 3.5 and 4.0. Those when converted to the integers both would result to the integer value of 4.

Is there any way to guide rank method to output the integers?

like image 575
alphanumeric Avatar asked Oct 25 '16 20:10

alphanumeric


1 Answers

The above solution did not work for me. The following did work though. The critical line with edits is:

df['a_rank'] = df['a'].rank(method='dense').astype(int);

This could be a version issue.

like image 165
whisperer Avatar answered Oct 08 '22 01:10

whisperer