Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turning a column of strings into a column of integers in Pandas

Tags:

python

pandas

I'm trying to turn a column of strings into integer identifiers...and I cannot find an elegant way of doing this in pandas (or python). In the following example, I transform "A", which is a column/variable of strings into numbers through a mapping, but it looks like a dirty hack to me

import pandas as pd                                                                             
import numpy as np

df = pd.DataFrame({'A': ['homer_simpson', 'mean_street', 'homer_simpson', 'bla_bla'], 'B': 4})

unique = df['A'].unique()
mapping = dict(zip(unique, np.arange(len(unique))))

new_df = df.replace({'A': mapping})

Is there a better, more direct, way of achieving this?

like image 323
manu Avatar asked Mar 13 '23 20:03

manu


1 Answers

How about using factorize?

>>> labels, uniques = df.A.factorize()
>>> df.A = labels
>>> df
   A  B
0  0  4
1  1  4
2  0  4
3  2  4

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.factorize.html

like image 51
satomacoto Avatar answered May 03 '23 19:05

satomacoto