Clustering on non-numeric dimensions

Question

I recently started working on clustering and k-means algorithm and was trying to come up with a good use case and solve it.

I have the following data about the items sold in different cities.

Item City

Item1 New York
Item2 Charlotte
Item1 San Francisco
...

I would like to cluster the data based on variables city and item to find groups of cities that might have similar patterns for the items sold.The problem is the k-means I use do not accept non-numeric input. Any idea how should I proceed with this to find a meaningful solution.

Thanks SV

bendaizer · Accepted Answer

Clustering requires a distance definition. A cluster is only a cluster if the items are "closer" according to some distance function. The closer they are, the more likely they belong to the same cluster.

In your case, you can try to cluster based on various data related to the cities, like their geographical coordinates, or demographic informations, and see if the clusters overlap in the various cases !

Clustering on non-numeric dimensions

Tags:

cluster-analysis

k-means

Neo_32

1 Answers

bendaizer

Recent Activity

Donate For Us

Clustering on non-numeric dimensions

Tags:

cluster-analysis

k-means

Neo_32

1 Answers

bendaizer

Related questions

Recent Activity

Donate For Us