Elasticsearch - Rank userIds based on score

Tags:

elasticsearch

I'm trying to migrate some of the queries of our old MySQL database to our new Elasticsearch setup. The data is a little bit more complex but boils down to the following:

I've got an index containing a lot of scores. Each score represents the points a player scored in a particular game.

{
  "userId": 2,
  "scoreId": 3457,
  "game": {
    "id": 6,
    "name": "scrabble"
  },
  "date": 1340047100,
  "score": 56,
  // and more game data
}

scoreId is the unique id for this score, game.id is the id of that type of game.

{
  "userId": 6,
  "gameId": 3479,
  "game": {
    "id": 5,
    "name": "risk"
  },
  "date": "1380067200",
  "score": 100,
  // and more game data
}

Over the years a lot of different games are played and I would like to rank the best players for each type of game. The ranking is based on the best 6 games of each player. So for example, if a player played scrabble 10 times, only its 6 best scores count for its total score.

I would like to create a list like:

// Scrabble ranking:
# | user | total points  
1 |  2   | 4500
2 |  6   | 3200
2 |  23  | 1500

The reason for the migration is that the old MySQL queries first get a list of all the distinct users for each game, and then executes another query for EACH user to get its best 6 scores. I hoped that I could use the aggregates of elastic to do it all in just one query but so far I can't make it work.

The problem is that after a couple of hours of reading the elastic docs it seems that my problem is more complex than the examples. Maybe if someone can point me a bit in the right direction I can continue my search. At least this is not getting me anywhere:

GET /my-index/scores/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": { "game.id": 6 }}
      ]
    }
  },
  "aggs": {
    "scores": {
      "terms": {
        "field": "userId"
      }
    },
    "top_scores_user": {
      "top_hits": {
        "sort": [{
          "score": {
            "order": "desc"
          }
        }],
        "size" : 6
      }
    }
  },
   "size": 0
}

I'm using elastic 2.3 but there's a chance I could upgrade if it's really necessary.

443

asked May 03 '17 15:05

Tieme

1 Answers

Using top_hits will not let you achieve what you need, because you cannot act upon the fields that are returned for each documents in the top hits aggregation.

One way to get around this is to use a top-level terms aggregation for users (as you did) and then for each user another terms sub-aggregation for the scores that you can sort in decreasing order and taking only the 6 best ones. Finally, using a pipeline sum_bucket aggregation, you can sum up those 6 scores for each user.

POST /my-index/scores/_search    
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "game.id": 6
          }
        }
      ]
    }
  },
  "aggs": {
    "users": {
      "terms": {              <--- segment by user
        "field": "userId"
      },
      "aggs": {
        "best_scores": {
          "terms": {          <--- 6 best scores for user
            "field": "score",
            "order": {
              "_term": "desc"
            },
            "size": 6
          },
          "aggs": {
            "total_score": {
              "sum": {
                "field": "score"
              }
            }
          }
        },
        "total_points": {     <--- total points for the user based on 6 best scores
          "sum_bucket": {
            "buckets_path": "best_scores > total_score"
          }
        }
      }
    }
  }
}

Note that one drawback of this solution is if the user had twice the exact same score, you'll get the 7 best scores and not the 6 best ones and the total_score value will be too high. We could use the avg instead of sum metric aggregation, but if we do this, we'll ignore one of the score occurrence, which is not good either.

Also note that it would be ideal to sort the users according to their total_points value, but it is not possible to sort using pipeline aggregations (since they run after the reduce phase). The sorting will need to happen on the client side.

115

answered Sep 29 '22 13:09

Val

Related questions
                            
                                how to configure Jira Dashboard in Kibana
                            
                                Elasticsearch document id type integer vs string : Is there any performace difference?
                            
                                ElasticSearch: compare dotted version strings
                            
                                Elasticsearch NoNodeAvailableException None of the configured nodes are available
                            
                                Laravel Scout - observe relations
                            
                                ElasticSearch as EventStore
                            
                                ElasticSearch - different result ordering for simple request and aggregation request (NEST)
                            
                                elasticsearch doc['...'] Arrays and order
                            
                                JestClient is throwing SocketTimeoutException after being idle for sometime
                            
                                Elasticsearch - Analyser creating the right tokens but query is not matching
                            
                                Mocking elasticsearch-py calls
                            
                                making a calculation with the elements of an elasticsearch json object, of a contract bridge score, using Python
                            
                                compute geo distance in elasticsearch
                            
                                Searching subtitle data in elasticsearch
                            
                                Update/delete existing log entry with logstash
                            
                                elasticsearch multi_match vs should
                            
                                Configure sink elasticsearch apache-flume
                            
                                Why is mongoosastic populate / elastic search not populating one of my references? I'm getting an empty object
                            
                                Elastic search query using match_phrase_prefix and fuzziness at the same time?
                            
                                Filter or analyzer to equate English numbers and arabic numerals

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Elasticsearch - Rank userIds based on score

Tags:

aggregate

elasticsearch

Tieme

People also ask

1 Answers

Val

Recent Activity

Donate For Us