How to interpret solution metrics in AWS Personalize?

Tags:

amazon-personalize

Can someone help me interpret the AWS Personalize solution version metrics in layman’s terms or, at the very least, tell me what these metrics should ideally look like?

I have no knowledge of Machine Learning and wanted to take advantage of Personalize as it is marketed as a 'no-previous-knowledge-required' ML SaaS. However, the “Solution version metrics” in my solution results seem to require a fairly high level of math knowledge.

My Solution version metrics are as follows:
Normalized discounted cumulative
At 5: 0.9881, At 10: 0.9890, At 25: 0.9898
Precision
At 5: 0.1981, At 10: 0.0993, At 25: 0.0399
Mean reciprocal rank
At 25: 0.9833

Research

I have looked through the Personalize Developer's Guide which includes a short definition of each metric on page 72. I also attempted to skim through the Wikipedia articles on discounted cumulative gain and mean reciprocal rank. From reading, this is my interpretation of each metric:
NDG = Consistency of relevance of recommendations; Is the first recommendation as relevant as the last?
Precision = Relevance of recommendations to user; How relevant are your recommendations to users across the board?
MRR = Relevance of first recommendation in the list versus the others in the list; How relevant is your first recommendation to each user?

If these interpretations are right, then my solution metrics indicate that I am highly consistent about recommending irrelevant content. Is that a valid conclusion?

906

asked Oct 14 '19 17:10

starrywrites

1 Answers

Alright, my company has Developer Tier Support so I was able to get an answer to this question from AWS.

Answer Summary

The metrics are better the closer they are to '1'. My interpretation of my metrics was pretty much correct but my conclusion was not.

Apparently, these metrics (and Personalize in general) do not take into account how much a user likes an item. Personalize only cares how soon a relevant recommendation gets to the user. This makes sense because if you get the 25th item in a queue and don't like anything you've seen, you are not likely to continue looking.

Given this, what's happening in my solution is that the first-ish recommendation is relevant but none of the others are.

Detailed Answer from AWS

I will start with relatively easier question first: What are the ideal values for these metrics, so that a solution version can be preferred over another solution version? The answer to the above question is that for each metric, higher numbers are better. [1] If you have more than one solution version, please prefer the solution version with higher values for these metrics. Please note that you can create number of solution versions by Overriding Default Recipe Parameters [2]. And by using Hyperparameters [3].

The second question: How to understand and interpret the metrics for AWS Personalize Solution version? I can confirm from my research that the definitions and interpretation provided for these metrics in the case by you are valid.

Before I explain each metric, here is a primer for one of the main concept in Machine Learning. How these metrics are calculated? The Model training step during the creation of solution version splits the input dataset into two parts, a training dataset (~70%) and test dataset (~30%). The training dataset is used during the Model training. Once the model is trained, it is used to predict the values for test dataset. Once the prediction is made it is validated against the known (and correct) value in the test dataset. [4]

I researched further to find more resources to understand the concept behind these metrics and also elaborate further an example provided in the AWS documentation. [1]

"mean_reciprocal_rank_at_25"

Let’s first understand Reciprocal Rank: For example, a movie streaming service uses a solution version to predict a list of 5 recommended movies for a specific user i.e A, B, C, D, E. Once these 5 recommended movies are compared against the actual movies liked by that user (in the test dataset) we find out that only movie B and E are actually liked by the user. The Reciprocal Rank will only consider the first relevant (correct according to test dataset) recommendation which is movie B located at rank 2 and it will ignore the movie E located at rank 5. Thus the Reciprocal Rank will be 1/2 = 0.5

Now let’s expand the above example to understand Mean Reciprocal Rank: [5] Let’s assume that we ran predictions for three users and below movies were recommended.
User 1: A, B, C, D, E (user liked B and E, thus the Reciprocal Rank is 1/2)
User 2: F, G, H, I, J (user liked H and I, thus the Reciprocal Rank is 1/3)
User 3: K, L, M, N, O (user liked K, M and N, thus the Reciprocal Rank is 1)
The Mean Reciprocal Rank will be sum of all the individual Reciprocal Ranks divided by the total number of queries ran for predictions, which is 3. (1/2 + 1/3 + 1)/3 = (0.5+0.33+1)/3 = (1.83)/3 = 0.61

In case of AWS Personalize Solution version metrics, the mean of the reciprocal ranks of the first relevant recommendation out of the top 25 recommendations over all queries is called “mean_reciprocal_rank_at_25”.

"precision_at_K"

It can be stated as the capability of a model for delivering the relevant elements with the least amount of recommendations. The concept of precision is described in the following free video available at Coursera. [6] A very good article on the same topic can be found here. [7]

Let’s consider the same example, a movie streaming service uses a solution version to predict a list of 5 recommended movies for a specific user i.e; A, B, C, D, E. Once these 5 recommended movies are compared against the actual movies liked by that user (correct values in the test dataset) we find out that only movie B and E are actually liked by the user. The precision_at_5 will be 2 correctly predicted movies out of total 5 movies and can be stated as 2/5=0.4

"normalized_discounted_cumulative_gain_at_K"

This metric use the concept of Logarithm and Logarithmic Scale to assign weighting factor to relevant items (correct values in the test dataset). The full description of Logarithm and Logarithmic Scale is beyond the scope of this document. The main objective of using Logarithmic scale is to reduce wide-ranging quantities to tiny scopes.

discounted_cumulative_gain_at_K
Let’s consider the same example, a movie streaming service uses a solution version to predict a list of 5 recommended movies for a specific user i.e; A, B, C, D, E. Once these 5 recommended movies are compared against the actual movies liked by that user (correct values in the test dataset) we find out that only movie B and E are actually liked by the user. To produce the cumulative discounted gain (DCG) at 5, each relevant item is assigned a weighting factor (using Logarithmic Scale) based on its position in the top 5 recommendations. The value produced by this formula is called as “discounted value”.
The formula is 1/log(1 + position)
As B is at position 2 so the discounted value is = 1/log(1 + 2)
As E is at position 5 so the discounted value is = 1/log(1 + 5)
The cumulative discounted gain (DCG) is calculated by adding discounted values for both relevant items DCG = ( 1/log(1 + 2) + 1/log(1 + 5) )

normalized_discounted_cumulative_gain_at_K
First of all, what is “ideal DCG”? In the above example the ideal predictions should look like B, E, A, C, D. Thus the relevant items should be at number 1 and 2 in ideal case. To produce the “ideal DCG” at 5, each relevant item is assigned a weighting factor (using Logarithmic Scale) based on its position in the top 5 recommendations. The value produced by this formula is called as “discounted value”.
The formula is 1/log(1 + position).
As B is at position 1 so the discounted value is = 1/log(1 + 1)
As E is at position 2 so the discounted value is = 1/log(1 + 2)
The ideal DCG is calculated by adding discounted values for both relevant items DCG = ( 1/log(1 + 1) + 1/log(1 + 2) )

The normalized discounted cumulative gain (NDCG) is the DCG divided by the “ideal DCG”. DCG / ideal DCG = (1/log(1 + 2) + 1/log(1 + 5)) / (1/log(1 + 1) + 1/log(1 + 2)) = 0.6241

I hope the information provided above is helpful in understanding the concept behind these metrics.

[1] https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html
[2] https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config.html
[3] https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html
[4] https://medium.com/@m_n_malaeb/recall-and-precision-at-k-for-recommender-systems-618483226c54
[5] https://www.blabladata.com/2014/10/26/evaluating-recommender-systems/
[6] https://www.coursera.org/lecture/ml-foundations/optimal-recommenders-4EQc2
[7] https://medium.com/@bond.kirill.alexandrovich/precision-and-recall-in-recommender-systems-and-some-metrics-stuff-ca2ad385c5f8

answered Oct 13 '22 16:10

starrywrites

Related questions
                            
                                AWS AppSync Resolvers Lambda Function vs Velocity Template Language (VTL)
                            
                                Outputs from for_each loop for each resource
                            
                                Hostname/IP does not match certificate's altnames: POSTMAN
                            
                                Python library for Amazon MWS
                            
                                How can one use Amazon's DynamoDBMapper in Scala?
                            
                                Getting a list of instances in an EC2 auto scale group?
                            
                                How to pipe data from AWS Postgres RDS to S3 (then Redshift)?
                            
                                Cloudformation template validate
                            
                                Amazon S3 bucket MalformedXML error when uploading
                            
                                Use cloudwatch to determine if linux service is running
                            
                                How to get IAM Policy Document via boto
                            
                                Pass files from Amazon S3 through NodeJS server without exposing S3 URL?
                            
                                How do I set up an AngularJS app using AWS?
                            
                                Amazon EC2 instance starts automatically
                            
                                Passing parameters of type List<AWS::EC2::Subnet::Id> to nested CloudFormation template
                            
                                Custom attributes missing in the getUserAttributes response - AWS cognito
                            
                                Enable CORS when running AWS SAM CLI locally
                            
                                Bulk Generate Pre-Signed URLs boto3
                            
                                Event Object is empty in AWS Lambda nodejs function
                            
                                Why is dagger considered better for AWS lambda implementation than Guice?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to interpret solution metrics in AWS Personalize?

Tags:

amazon-web-services

amazon-personalize

starrywrites

People also ask

1 Answers

starrywrites

Recent Activity

Donate For Us