To substitute the numbers with their corresponding "ranks":
import pandas as pd
import numpy as np
numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,))
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank()
I am getting the float values as the default output type of rank
method:
987 82.0
988 36.5
989 526.0
990 219.0
991 957.0
992 819.5
993 787.5
994 513.0
Instead of floats
I would rather have the integers. Rounding the resulted float
values using asType(int)
would be risky since converting to int
would probably introduce the duplicated values from the float
values that are too close to each other such as 3.5
and 4.0
. Those when converted to the integers both would result to the integer value of 4
.
Is there any way to guide rank
method to output the integers?
The above solution did not work for me. The following did work though. The critical line with edits is:
df['a_rank'] = df['a'].rank(method='dense').astype(int);
This could be a version issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With