Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do scoring profiles generate scores in Azure Search?

I want to add a scoring profile on my index on Azure Search. More specifically, every document in my index has a weight field of type Edm.Double, and I want to boost them according to this value. I don't want to just directly sort them with respect to weight because the relevance of the search term is also important.

So just to test it out, I created a scoring profile with a magnitude function with boost value 1000 (just to see if I got how this thing works), linear interpolation, starting value 0 and ending value 1. What I was expecting was the boost value to be added to the overall search score. So a document with weight 0.5 would get a boost of 500, whereas a document with weight 0.125 would get a boost of 125. However, the resulting scores were nowhere near this much intuitive.

I have a couple of questions in this case:

1) How is the function score generated in this case? I have documents with weights close to each other(let's say 0.5465 and 0.5419), but the differences between their final scores is around 100-150, whereas I would expect it to be around 4-5.

2) How are function scores and weights aggregated into a final score for each search result?

like image 243
halileohalilei Avatar asked Jan 02 '17 13:01

halileohalilei


People also ask

What is scoring profile in Azure search?

Scoring profile definition A scoring profile is part of the index definition and is composed of weighted fields, functions, and parameters. The following definition shows a simple profile named 'geo'. This example boosts results that have the search term in the hotelName field.

How Azure score is calculated?

The Secure score is calculated based on the ratio between your healthy resources and your total resources. If the number of healthy resources is equal to the total number of resources, you get the highest Secure Score value possible for a recommendation, which can go up to 50.

What is scoring in Azure?

Azure Search computes a search score for every item returned in search results. The score indicates an item's relevance in the context of the given search operation and determines the order of the item in the set of search results. You can adjust the default scoring for a search index by adding a scoring profile.


1 Answers

So the provided answer by Nate is difficult to understand and it misses some components. I have made an overview of the entire scoring process, and its quite complex.

So when an user executes a search a query is given to Azure Search. Azure search uses the TF-IDF algorithm to determine a score from 0-1 based on Tokens being formed by the Analyzer. Keep in mind that language specific analyzers can come up with multiple tokens for one word. For every searchable field the score will be produced and then multiplied by the weight in the scoring profile. Lastly all weighted scores will be summed up and that's the initial weighted score.

A scoring profile might also contain scoring functions. The scoring function can be either a magnitude, freshness, geo or tag based function. Multiple functions can be made within one scoring profile.

The functions will be evaluated and the score from the functions can be either summed up, or taken the average, minimum, maximum or first matching. The total of all functions is then multiplied by the total weighted score and that's the final score.

An example, this is an example index with scoring profile.

{  
  "name": "musicstoreindex",  
  "fields": [  
    { "name": "key", "type": "Edm.String", "key": true },  
    { "name": "albumTitle", "type": "Edm.String" },  
    { "name": "genre", "type": "Edm.String" },  
    { "name": "genreDescription", "type": "Edm.String", "filterable": false },  
    { "name": "artistName", "type": "Edm.String" },  
    { "name": "rating", "type": "Edm.Int32" },  
    { "name": "price", "type": "Edm.Double", "filterable": false },  
    { "name": "lastUpdated", "type": "Edm.DateTimeOffset" }  
  ],  
  "scoringProfiles": [  
    {  
      "name": "boostGenre",  
      "text": {  
        "weights": {  
          "albumTitle": 1.5,  
          "genre": 5,  
          "artistName": 2  
        }  
      }  
    },  
    {  
      "name": "newAndHighlyRated",  
      "functions": [  
        {  
          "type": "freshness",  
          "fieldName": "lastUpdated",  
          "boost": 10,  
          "interpolation": "linear",  
          "freshness": {  
            "boostingDuration": "P365D"  
          }  
        },  
        {
          "type": "magnitude",  
          "fieldName": "rating",  
          "boost": 8,  
          "interpolation": "linear",  
          "magnitude": {  
            "boostingRangeStart": 1,  
            "boostingRangeEnd": 5,  
            "constantBoostBeyondRange": false  
          }  
        }  
      ],
      "functionAggregation": 0
    }  
  ]
}

Lets say the entered query is meteora the famous album by Linkin Park. Lets say we have the following document in our index.

{
    "key": 123,
    "albumTitle": "Meteora",
    "genre": "Rock",
    "genreDescription": "Rock with a flick of hiphop",
    "artistName": "Linkin Park",
    "rating": 4,
    "price": 30,
    "lastUpdated": "2020-01-01" 
}

I'm not an expert on TF-IDF but I can imagine that the following unweighted score will be produced:

{
    "albumTitle": 1,
    "genre": 0,
    "genreDescription": 0,
    "artistName": 0
}

The scoring profile has a weight of 1.5 on the albumTitle field, so the total weighted score will be: 1 * 1.5 + 0 + 0 + 0 = 1.5

After that the scoring profile functions will be evaluated. In this case there are 2. The first one evaluates the freshness with a range of 365 days, one year. The last updated field has a value of the 1st of April this year. Lets say thats 50 days from now. The total range is 365 so you will get a score of 1 if the last updated date is today. And a 0 if its 365 days or more in the past. In our case its 1 - 50 / 365 = 0.8630... The boost of the function is 10 so the score for the first function is 8.630.

The second function is a magnitude function with a range from 1 to 5. The document got a 4 star rating so thats worth a score of 0.8, because a 1 star is 0 and 5 stars is 1. So a for 4 star is obviously 4 / 5 = 0.8. The boost of the magnitude function is 8 so we have to multiple the value with 8. 0.8 * 8 = 6.4.

The functionAggregation is 0, which means we have to sum the results of all functions. Giving us a total score of scoring profile functions of: 6.4 + 8.630 = 15.03. The rule is then to multiple the total scoring profile functions score with the total weighted score of the fields giving us a grand total of: 15.03 * 1.5 = 22.545.

Hope you enjoined this example.

like image 77
Dibran Avatar answered Nov 01 '22 22:11

Dibran