Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

n-dimensional matching algorithm

Looking for some advice here. Does anyone know a good place to start looking into matching algorithm in a n-dimensional space. For example, any dating site out there must be using some sort of algorithm to match 2 people. What I have read is that we can map characteristics of a person in a n-dimensional array with a point system for each characteristic. Once we have all (available) characteristics of a person, we can represent this person in a point within a n-dimensional array. Then, to match 2 person would be as simple as finding the shortest distance between 2 point in this n-dim array. Does anyone has any reference in implementation of these kind of algorithm? What's the best language to write these kind of stuff in?

like image 333
Herman Avatar asked Mar 24 '09 15:03

Herman


3 Answers

If you want to find the closest match for one person, Bentley & Shamos published a multi-dimensional divide-and-conquer method: Divide-and-conquer in O(N log N) time: Divide-and-conquer in multidimensional space in Proceedings of the eighth annual ACM symposium on Theory of computing 1976. If you can't get a copy this may also be helpful.

However for your example application actually finding the nearest neighbour doesn't seem to be the biggest problem - much trickier is mapping inputs into dimensions. For example if one dimension is "likes animals", what value do you give to someone who likes dogs & cats but can't stand horses? What about someone who loves horses, thinks dogs are OK, is annoyed by cats and is ambivalent about goldfish?

like image 71
danio Avatar answered Sep 20 '22 04:09

danio


First off, pick the language you are most familiar with. The algorithms for handling this are fairly straightforward, and should work in any modern language. (As long as there is some concept of array, and potentially a matrix library, you should be fine.) I've implemented many of these in C, C++, and C# before, but seen implementations in python, vb.net, etc.

Depending on what you are trying to do, there are a couple of options.

That being said, what you want to do depends on your goals. If you just want to find the best match, you can use simple distance calculations (ie: sqrt of sum of squares for each dimension/property in the n-dimensional array), optionally weight each properties distance, and use the closest point.

If you want to group together people, you'll want to look at clustering algorithms. For data like this, I would suspect that some form of K-Means clustering or fuzzy c-means clustering would work the best.

like image 21
Reed Copsey Avatar answered Sep 23 '22 04:09

Reed Copsey


How about following solution.

Assume the users are U1, U2, U3, U4, U5.... Un. Attributes are A1, A2, A3, A4, A5..... Am

Store these as

A1 - U1, U2, U3... A2 - U4, U6, U7.... A3 -

Profile attribute is the index and stores all users. Now, if a new User comes, see the its attributes and for those attributes, find common people. number of times a person exists in these lists - higher ranking.

like image 42
Thoughtful Monkey Avatar answered Sep 20 '22 04:09

Thoughtful Monkey