Looking for some advice here. Does anyone know a good place to start looking into matching algorithm in a n-dimensional space. For example, any dating site out there must be using some sort of algorithm to match 2 people. What I have read is that we can map characteristics of a person in a n-dimensional array with a point system for each characteristic. Once we have all (available) characteristics of a person, we can represent this person in a point within a n-dimensional array. Then, to match 2 person would be as simple as finding the shortest distance between 2 point in this n-dim array. Does anyone has any reference in implementation of these kind of algorithm? What's the best language to write these kind of stuff in?
If you want to find the closest match for one person, Bentley & Shamos published a multi-dimensional divide-and-conquer method: Divide-and-conquer in O(N log N) time: Divide-and-conquer in multidimensional space in Proceedings of the eighth annual ACM symposium on Theory of computing 1976. If you can't get a copy this may also be helpful.
However for your example application actually finding the nearest neighbour doesn't seem to be the biggest problem - much trickier is mapping inputs into dimensions. For example if one dimension is "likes animals", what value do you give to someone who likes dogs & cats but can't stand horses? What about someone who loves horses, thinks dogs are OK, is annoyed by cats and is ambivalent about goldfish?
First off, pick the language you are most familiar with. The algorithms for handling this are fairly straightforward, and should work in any modern language. (As long as there is some concept of array, and potentially a matrix library, you should be fine.) I've implemented many of these in C, C++, and C# before, but seen implementations in python, vb.net, etc.
Depending on what you are trying to do, there are a couple of options.
That being said, what you want to do depends on your goals. If you just want to find the best match, you can use simple distance calculations (ie: sqrt of sum of squares for each dimension/property in the n-dimensional array), optionally weight each properties distance, and use the closest point.
If you want to group together people, you'll want to look at clustering algorithms. For data like this, I would suspect that some form of K-Means clustering or fuzzy c-means clustering would work the best.
How about following solution.
Assume the users are U1, U2, U3, U4, U5.... Un. Attributes are A1, A2, A3, A4, A5..... Am
Store these as
A1 - U1, U2, U3... A2 - U4, U6, U7.... A3 -
Profile attribute is the index and stores all users. Now, if a new User comes, see the its attributes and for those attributes, find common people. number of times a person exists in these lists - higher ranking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With