I want to fuzzy cluster a set of jobs. Jobs Attributes are:
My question is: how to calculate the distance between different jobs?
e.g job1(programmer,bs computer science,(java ,.net,responsibility),1500, 3)
and job2(tester,bs computer science,(black and white box testing),1200,1)
PS: I'm beginner in data mining clustering, I highly appreciate your help.
Which measure of central tendency can be used for both numerical and categorical variables? Mean.
Calculating Distance A popular choice for clustering is Euclidean distance.
Method 1: Assign each value of category as a binary dummy variable. We assign each value of Mode as a binary dummy variable. The distance between two objects is the ratio of number of unmatched and total dummy variables. For example, we have two variables: Gender and Mode.
Distance in miles is a quantitative variable because it takes on numerical values with meaningful magnitudes and equal intervals.
You may take this as your starting point: http://www.econ.upf.edu/~michael/stanford/maeb4.pdf. Distance between categorical data is nicely explained at the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With