Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommendation engine without ratings

I have found what must be dozens of articles on Towards Data Science/ medium/ etc. of people making recommendation engines with imdb data (based on ratings that users gave to movies, what movies should we recommend to those users). These articles begin with 'memory based approaches' of user-based content filtering and item-based content filtering. I have been tasked with making a recommendation engine, and since none of the suits really care or know anything about this, I want to do the bare minimum (which seems to be user-based content filtering).

Problem is, all of my data is binary (no ratings, just based on the items that other users bought, should we recommend items to similar users - this is actually similar to the cartoons that all of the medium articles have stolen from eachother, but none of the medium articles give an example of how to do that).

All of the articles use Pearson Correlation or cosine similarity to determine user similarity, can I use these approaches with binary dimensions (bought or not), if so how, and if not is there a different way to measure user similarity?

I am working with python btw. And I was thinking of maybe using Hamming Distance (is there a reason that wouldn't be good)

like image 221
amchugh89 Avatar asked Nov 22 '19 15:11

amchugh89


3 Answers

  • Similarity score based approaches do work even with binary dimension. When you have scores, two similar users may look like [5,3,3,0,1] and [4,3,3,0,0], where as in your case it would be something like [1,1,1,0,1] and [1,1,1,0,0].
from scipy.spatial.distance import cosine
1 - cosine([5,3,2,0,1],[4,3,3,0,0])
0.961161313666907
1 - cosine([1,1,1,0,1],[1,1,1,0,0]) 
0.8660254037844386
  • Another approach is, if you can get the number of times a user bought a product, that count can be used as rating and then similarities can be calculated
like image 149
Arun Joy Thekkiniyath Avatar answered Jan 02 '23 07:01

Arun Joy Thekkiniyath


The data you have is an implicit data which means interactions are not necessarily indicate user's interest it's just interaction. Interaction value of 1 and interaction value of 1000 has no difference in this case they both shows interaction nothing else, such that memory based algorithms are useless here. If you are not familiar with neural networks, then you have to at least use matrix factorization techniques to make a meaningful recommendation using this data, you can start with surprise library here which has a bunch of matrix factorization models.

It will be better if you use ALS as optimization technique, but SGD will also do the work. If you are ok with deep-learning I can refer to the sources of the best work so far.

I once used non-negative matrix factorization(NNMF for short) algorithm in surprise for data like yours and the results was good enough.

like image 34
Abdirahman Avatar answered Jan 02 '23 06:01

Abdirahman


It seems, that in your situation the best approach would be collaborative filtering. You don't need scores, everything that you need is a user-item interaction matrix. The simplest algorithm, in this case, is Alternating Least Square (ALS).

There're already a few implementations in python. For instance, this one. Also, there's an implementation in PySpark recommendation module.

like image 41
Danylo Baibak Avatar answered Jan 02 '23 06:01

Danylo Baibak