Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to compute AUC(Area Under Curve) for recommendation system evaluation

I got confused about computing AUC (area under curve) to evaluate recommendation system result.

If we have cross validation data like (user, product, rating). How to choose positive sample and negative sample for each user to compute AUC?

Is it good to choose products occurred for each user in dataset as positive sample, and the rest did not occur in dataset as negative sample? I think this way can not find out those "real" negative samples, because user has chance to like those products in negative samples.

like image 734
Jaming LAM Avatar asked Jan 20 '17 06:01

Jaming LAM


People also ask

How is AUC calculated?

AUC :Area under curve (AUC) is also known as c-statistics. Some statisticians also call it AUROC which stands for area under the receiver operating characteristics. It is calculated by adding Concordance Percent and 0.5 times of Tied Percent.

How do you calculate AUC manually?

You can divide the space into 2 parts: a triangle and a trapezium. The triangle will have area TPR*FRP/2 , the trapezium (1-FPR)*(1+TPR)/2 = 1/2 - FPR/2 + TPR/2 - TPR*FPR/2 . The total area is 1/2 - FPR/2 + TPR/2 . This is how you can get it, having just 2 points.

How do you calculate AUC in ROC curve?

ROC AUC is the area under the ROC curve and is often used to evaluate the ordering quality of two classes of objects by an algorithm. It is clear that this value lies in the [0,1] segment. In our example, ROC AUC value = 9.5/12 ~ 0.79.

Why AUC is calculated?

The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.


1 Answers

"A ROC curve plots recall (true positive rate) against fallout (false positive rate) for increasing recommendation set size." Schröder, Thiele, and Lehner 2011 (PDF)

In general, you will hold out a portion of your data as testing data. For a particular user, you would train on (for instance) 80% of her data and try to predict which items (out of all items in your dataset) she'll exhibit a preference for based on the remaining 20% of her data.

Let's say you're building a Top-20 recommender. The 20 items you recommend for a user are the Positive items, and the unrecommended items are Negative. True Positive items are therefore the items that you showed in your Top-N list that match what the user preferred in her held-out testing set. False Positive are the items in your Top-N list that don't match her preferred items in her held-out testing set. True Negative items are those you didn't include in your Top-N recommendations and are items the user didn't have in her preferred items in her held-out testing set. And False Negative are items you didn't include in your Top-N recommendations but do match what the user preferred in her held-out testing set. That's the confusion matrix. Now you can vary the number of items you recommend and calculate the confusion matrix for each, calculate recall and fallout for each, and plot the ROC.

like image 109
Dan Jarratt Avatar answered Sep 28 '22 04:09

Dan Jarratt