Kmeans using categorical variables

Tags:

I have a large data set 45421 * 12 (rows * columns) which contains all categorical variables. There are no numerical variables in my dataset. I would like to use this dataset to build unsupervised clustering model, but before modeling I would like to know the best feature selection model for this dataset. And I am unable to plot elbow curve to this dataset. I am giving range k = 1-1000 in k-means elbow method but it's not giving any optimal clusters plot and taking 8-10 hours to execute. If any one suggests a better solution to this issue it will be a great help.

Code:

data = {'UserName':['infuk_tof', 'infus_llk', 'infaus_kkn', 'infin_mdx'], 
       'UserClass':['high','low','low','medium','high'], 
       'UserCountry':['unitedkingdom','unitedstates','australia','india'], 
       'UserRegion':['EMEA','EMEA','APAC','APAC'], 
       'UserOrganization':['INFBLRPR','INFBLRHC','INFBLRPR','INFBLRHC'], 
       'UserAccesstype':['Region','country','country','region']} 

df = pd.DataFrame(data)

653

asked Dec 12 '19 18:12

Praveen

1 Answers

For categorical data like this, K-means is not the appropriate clustering algorithm. You may want to look for a K-modes method, which unfortunately not currently included in scikit-learn package. You may want to look at this package for kmodes available on github: https://github.com/nicodv/kmodes which follows much of the syntax you're used to from scikit-learn.

For more, please see the discussion here: https://datascience.stackexchange.com/questions/22/k-means-clustering-for-mixed-numeric-and-categorical-data

176

answered Oct 20 '22 00:10

sjc

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kmeans using categorical variables

Tags:

python

machine-learning

unsupervised-learning

scikit-learn

data-science

Praveen

People also ask

1 Answers

sjc

Recent Activity

Donate For Us

Kmeans using categorical variables

Tags:

python

machine-learning

unsupervised-learning

scikit-learn

data-science

Praveen

People also ask

1 Answers

sjc

Related questions

Recent Activity

Donate For Us