Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

K Nearest Neighbour Algorithm doubt

I am new to Artificial Intelligence. I understand K nearest neighbour algorithm and how to implement it. However, how do you calculate the distance or weight of things that aren't on a scale?

For example, distance of age can be easily calculated, but how do you calculate how near is red to blue? Maybe colours is a bad example because you still can say use the frequency. How about a burger to pizza to fries for example?

I got a feeling there's a clever way to do this.

Thank you in advance for your kind attention.

EDIT: Thank you all for very nice answers. It really helped and I appreciate it. But I am thinking there must be a way out.

Can I do it this way? Let's say I am using my KNN algorithm to do a prediction for a person whether he/she will eat at my restaurant that serves all three of the above food. Of course, there's other factors but to keep it simple, for the field of favourite food, out of 300 people, 150 loves burger, 100 loves pizza, and 50 loves fries. Common sense tells me favourite food affect peoples' decision on whether to eat or not.

So now a person enters his/her favourite food as burger and I am going to predict whether he/she's going to eat at my restaurant. Ignoring other factors, and based on my (training) previous knowledge base, common sense tells me that there's a higher chance the k nearest neighbours' distance for this particular field favourite food is nearer as compared to if he entered pizza or fries.

The only problem with that is that I used probability, and I might be wrong because I don't know and probably can't calculate the actual distance. I also worry about this field putting too much/too little weight on my prediction because the distance probably isn't to scale with other factors (price, time of day, whether the restaurant is full, etc that I can easily quantify) but I guess I might be able to get around it with some parameter tuning.

Oh, everyone put up a great answer, but I can only accept one. In that case, I'll just accept the one with highest votes tomorrow. Thank you all once again.

like image 997
wai Avatar asked Mar 29 '09 17:03

wai


1 Answers

Represent all food for which you collect data as a "dimension" (or a column in a table).

Record "likes" for every person on whom you can collect data, and place the results in a table:

          Burger  |    Pizza  |   Fries   | Burritos |  Likes my food
person1     1     |        0  |       1   |     1    |      1
person2     0     |        0  |       1   |     0    |      0
person3     1     |        1  |       0   |     1    |      1
person4     0     |        1  |       1   |     1    |      0

Now, given a new person, with information about some of the foods he likes, you can measure similarity to other people using a simple measure such as the Pearson Correlation Coefficient, or the Cosine Similarity, etc.

Now you have a way to find K nearest neighbors and make some decision..

For more advanced information on this, look up "collaborative filtering" (but I'll warn you, it gets math-y).

like image 120
SquareCog Avatar answered Oct 27 '22 11:10

SquareCog