Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How recommenderlab of R culculate the ratings of each item in ratingMatrix?

Recently, I started using R's recommenderlab package in my studies.
This is recommenderlab document:

http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

There are some examples in this document, but I have a big question.

  • First, load recommenderlab package and Jester5k data set.

    library("recommenderlab")
    data(Jester5k)
    
  • Use the frontest 1000 records (users) of Jester5k to learn. The recommendation algorithm is POPULAR.

    r <- Recommender(Jester5k[1:1000], method="POPULAR")
    
  • Then predict the 1001th user's recommendation list. List the top 5 items.

    recom <- predict(r, Jester5k[1001], n=5)<br/>
    as(recom, "matrix")
    

output:

[1] "j89" "j72" "j47" "j93" "j76"<br/>
  • Then I check the rating of the 5 items above.

    rating <- predict(r, Jester5k[1001], type="ratings")<br/>
    as(rating, "matrix")[,c("j89", "j72", "j47", "j93", "j76")]
    

output:

j89       j72       j47       j93       j76<br/>
2.6476613 2.1273894 0.5867006 1.2997065 1.2956333<br/>

Why is the top 5 list "j89" "j72" "j47" "j93" "j76", but j47's rating is only 0.5867006.

I do not understand.

How does recommenderlab calculate the ratings of each item in ratingMatrix?

And how does it produce the TopN list?

like image 241
Ciphero Chen Avatar asked Nov 13 '22 01:11

Ciphero Chen


1 Answers

To get a more clear picture of your issue I suggest that you read this: "recommenderlab: A Framework for Developing and Testing Recommendation Algorithms"

Why is the top 5 list "j89" "j72" "j47" "j93" "j76"

You are using the popularity method, this means that you are choosing the top 5 list based on the most rated items(counting the number of saves), not the highest predicted rating.

How does recommenderlab calculate the ratings of each item in ratingMatrix? And how does it produce the TopN list?

The predicted rating, recommanderlab calculates them using the usual distance methods(not yet clear if it is pearson or cosine, I didn't have the chance to check it out) then it determines the rating , as suggested by Breeseet al. (1998), mean rating plus a weighted factor calculated on the neighborhood, you can consider the entire training set as the neighborhood of any user, that is why the predicted ratings for any user on the same item will have the same value.

My best. L

like image 123
theLudo Avatar answered Nov 15 '22 05:11

theLudo