Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert classes to numeric in a pandas dataframe

I' doing a project based on this Kaggle dataset: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings/data and I need to put the data into a kNN model, however this can't be done in its current state as I need to transform the string values into integers.

get_dummies isn't ideal as there are loads of categorical data in the dataset and will create thousands of columns. I am looking for a way to transform strings to numeric representations, for example:

Platform || Critic_Score || Publisher || Global_Sales
Wii      ||      73      ||  Nintendo ||  53
Wii      ||      86      ||  Nintendo ||  60
PC       ||      80      ||Activision ||  30
PS3      ||      74      ||Activision ||  35
Xbox360  ||      81      ||   2K      ||  38

I'd like to transform into this:

Platform || Critic_Score || Publisher || Global_Sales
  1      ||      73      ||     1     ||  53
  1      ||      86      ||     1     ||  60
  2      ||      80      ||     2     ||  30
  3      ||      74      ||     2     ||  35
  4      ||      81      ||     3     ||  38

I'm using Python 3.

Thanks.

like image 767
jceg316 Avatar asked Dec 14 '22 19:12

jceg316


1 Answers

I think you need factorize:

df['Platform'] = pd.factorize(df['Platform'])[0] + 1
df['Publisher'] = pd.factorize(df['Publisher'])[0] + 1
print (df)
   Platform  Critic_Score  Publisher  Global_Sales
0         1            73          1            53
1         1            86          1            60
2         2            80          2            30
3         3            74          2            35
4         4            81          3            38

cols = ['Platform', 'Publisher']
df[cols] = df[cols].apply(lambda x: pd.factorize(x)[0] + 1)

print (df)
   Platform  Critic_Score  Publisher  Global_Sales
0         1            73          1            53
1         1            86          1            60
2         2            80          2            30
3         3            74          2            35
4         4            81          3            38
like image 106
jezrael Avatar answered Dec 27 '22 03:12

jezrael