Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary recommendation algorithms

I'm currently doing some research for a school assignment. I have two data streams, one is user ratings and the other is search, click and order history (binary data) of a webshop.

I found that collaborative filtering is the best family of algorithms if you are using rating data. I found and researched these algorithms:

Memory-based

  1. user-based

    • pearson correlation
    • constrainted pearson
    • vector similaritys (cosinus)
    • Mean squared difference
    • weighted pearson
    • correlation threshold
    • max number of neighbours
    • weighted by correlation
    • Z-score normalization
  2. item-based

    • adjusted cosine
    • maximum number of neighbours
  3. similarity fusion

model based

  1. regression based
  2. slope one
  3. lsi/svd
  4. regularized svd (rsvd/rsvd2/nsvd2/svd++)
  5. integrated neighbor based
  6. cluster based smoothing

Now I'm looking for a way to use the binary data, but I'm having a hard time figuring out if it is possible to use binary data instead of rating data with these algorithms or is there a different family of algorithms I should be looking at ?

I apologize in advance for spelling errors since I have dyslexia and am not a native writer.Thanks marc_s for helping.

like image 228
DutchGuy Avatar asked Oct 19 '22 02:10

DutchGuy


1 Answers

Take a look at data mining algorithms such as association rule mining (aka market basket analysis). You've come upon a tough problem in recommendation systems: unary and binary data are common but the best algorithms for personalization don't work well with them. Rating data can represent preference for a single user-item pair; e.g., I rate this movie 4 stars out of 5. But with binary data, we have the least granular type of rating data: I either like or don't like something, or have or have not consumed it. Be careful not to confuse binary and unary data: unary data means that you have information that a user consumed something (which is coded as 1, much like binary data), but you have no information about whether a user didn't like or consume something (which is coded as NULL instead of binary data's 0). For instance, you may know that a person viewed 10 web pages, but you don't have any idea what she would have thought of other pages had she known they were available. That's unary data. You can't assume any preference information from NULL.

like image 160
Dan Jarratt Avatar answered Oct 22 '22 21:10

Dan Jarratt