Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixed variables (categorical and numerical) distance function

I want to fuzzy cluster a set of jobs. Jobs Attributes are:

  1. Categorical: position,diploma, skills
  2. Numerical : salary , years of experience

My question is: how to calculate the distance between different jobs?
e.g job1(programmer,bs computer science,(java ,.net,responsibility),1500, 3)
and job2(tester,bs computer science,(black and white box testing),1200,1)

PS: I'm beginner in data mining clustering, I highly appreciate your help.

like image 285
Mariya Avatar asked Aug 07 '11 14:08

Mariya


People also ask

Can be used for both numerical and categorical variables?

Which measure of central tendency can be used for both numerical and categorical variables? Mean.

Which distance measure is best for mixed data types clustering?

Calculating Distance A popular choice for clustering is Euclidean distance.

How do you find the distance between two categorical variables?

Method 1: Assign each value of category as a binary dummy variable. We assign each value of Mode as a binary dummy variable. The distance between two objects is the ratio of number of unmatched and total dummy variables. For example, we have two variables: Gender and Mode.

Is distance numerical or categorical?

Distance in miles is a quantitative variable because it takes on numerical values with meaningful magnitudes and equal intervals.


1 Answers

You may take this as your starting point: http://www.econ.upf.edu/~michael/stanford/maeb4.pdf. Distance between categorical data is nicely explained at the end.

like image 132
iinception Avatar answered Oct 08 '22 12:10

iinception