Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get wrong recommendation with ALS.recommendation

I write a spark program for making recommendations. Then I used ALS.recommendation library. And I made a small test with the following dataset called trainData:

(u1, m1, 1)
(u1, m4, 1)
(u2, m2, 1)
(u2, m3, 1)
(u3, m1, 1)
(u3, m3, 1)
(u3, m4, 1)
(u4, m3, 1)
(u4, m4, 1)
(u5, m2, 1)
(u5, m4, 1)

The first column contains the user, the second contains the items rated by the users and the third contains the ratings.

In my code written in scala I trained the model using:

myModel = ALS.trainImplicit(trainData, 3, 5, 0.01, 1.0)

I try to retrieve some recommendations for u1 using this instruction:

recommendations = myModel.recommendProducts(idUser, 2)

where idUser contains the ID affected to the user u1 As recommendations, I obtain:

(u1, m1, 1.0536233346170754)
(u1, m4, 0.8540954252858661)
(u1, m3, 0.09069877419040584)
(u1, m2, -0.1345521479521654)

As you can see, the first two lines show that the items recommended are the ones that u1 had already rated (m1 and m4). Whatever the user I select to obtain the recommendations, I always get the same behavior (the first items recommended are the ones the user already rated).

I find it weird! Is there any problem anywhere?

like image 964
semteu Avatar asked Dec 02 '16 18:12

semteu


People also ask

How does ALS recommendation work?

Recommendation using Alternating Least Squares (ALS) The general approach is iterative. During each iteration, one of the factor matrices is held constant, while the other is solved for using least squares. The newly-solved factor matrix is then held constant while solving for the other factor matrix.

What is ALS recommendation?

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M : It runs the ALS algorithm in a parallel fashion.

How do you check the accuracy of a recommendation?

What you can do is divide the matrix into training and testing dataset. For example, you can cut a 4 * 4 submatrix from the lower right end of 10 * 20 matrix. Train the recommendation system on the remaining matrix and then test it against 4 * 4 cut. You will have the expected output and the output of your system.

What is alternating least squares ALS method in Recommendation Systems?

Description. The alternating least squares (ALS) algorithm factorizes a given matrix R into two factors U and V such that R≈UTV. The unknown row dimension is given as a parameter to the algorithm and is called latent factors.


1 Answers

I think that is the expected behaviour of using recommendProducts, when you are training a matrix factorization algorithm such as ALS you are attempting to find a rating that relates each user to each item.

ALS does this based on the items the user has already rated, so when you are finding recommendations for a given user the model will be most sure about the ratings it has already seen, so it will most of the times recommend products already rated.

What you need to do is to keep a list of products each user as rated and filter them when making the recommendations.

EDIT:

I dug a bit into the source code and the documentations to be sure of what I was saying.

ALS.recommendProducts is implemented in the class MatrixFactorizationModel (source code). You can see there that the model when making recommendations doesn't care if the user has already rated that item.

And you should note that if you are using implicit ratings then you most definetly want to recommend products already implicitly rated by the user: Imagine the case where your implicit ratings are page views of your product in an online store and what you want is that the user buys the product.

I don't have access to that book Advanced analytics with Spark so I can't comment on the explations and examples there.

Docs:

  • ALS

  • MatrixFactorizationModel

like image 195
João Almeida Avatar answered Oct 20 '22 10:10

João Almeida