Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas map string to int based on value in a column

I have a data frame with two columns :

state  total_sales
AL      16714
AR      6498
AZ      107296
CA      33717

Now I want to map the strings in state column to int from 1 to N(where N is the no of rows,here 4 ) based on increasing order of values in total_sales . Result should be stored in another column (say label). That is, wanted a result like this :

state  total_sales label
AL      16714         3
AR      6498          4
AZ      107296        1
CA      33717         2

Please suggest a vectorised implementation .

like image 552
Vijay P R Avatar asked Feb 19 '17 17:02

Vijay P R


2 Answers

You can use rank with cast to int:

df['label'] = df['total_sales'].rank(method='dense', ascending=False).astype(int)
print (df)
  state  total_sales  label
0    AL        16714      3
1    AR         6498      4
2    AZ       107296      1
3    CA        33717      2
like image 118
jezrael Avatar answered Oct 24 '22 21:10

jezrael


One option for converting a column of values to integers is pandas.Categorical.

This actually groups identical values, which in a case like this, where all values are unique each "group" has only one value. The resulting object has a codes attribute, which is a Numpy array of integers indicating which group each input value is in.

Applied to this problem, if you have

In [12]: data = pd.DataFrame({
             'state': ['AL', 'AR', 'AZ', 'CA'],
             'total_sales': [16714, 6498, 107296, 33717]
         })

you can add the label column as described using

In [13]: data['label'] = len(data) - pd.Categorical(data.total_sales, ordered=True).codes
In [14]: print(data)

  state  total_sales  label
0    AL        16714      3
1    AR         6498      4
2    AZ       107296      1
3    CA        33717      2

For this example it is not as fast as jezrael's answer but it has a wide range of applications and it was faster for a larger series I was encoding to integers. It should be noted that if there are two identical values in the total_sales column it will assign them the same label.

like image 28
Steven C. Howell Avatar answered Oct 24 '22 22:10

Steven C. Howell