Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to exclude large number of IDs from an Elastic Search query

I'm working on an app similar to Tinder. In ElasticSearch I have a collection of about half a million users and their locations). Whenever the user opens the app to search for nearby users I run an Elastic Search query over that collection. The query is fairly complex, it takes into consideration not only the location but also how active the user is or how many photos he has.

What I struggle with is how to exclude those users who the current user already swiped through from the query. A naive way to implement this would probably be to maintaint a nested array of user IDs as part of every user document in the index and exclude based on that. But as every user does dozens of thousands swipes that array could potentially grow super big, so it's not a scalable solution.

Is there a way to exclude large number of entities from an Elastic Search query based on their IDs which does not hurt performace?

like image 237
Martin Šťáva Avatar asked Oct 07 '15 10:10

Martin Šťáva


People also ask

How do I retrieve more than 10000 records in elastic search?

You can use scroll API to retrieve more than 10000 records in elastic search as by default, 10000 is the upper cap for the number of documents returned.

Why is elastic search so fast?

Elasticsearch is fast.Because Elasticsearch is built on top of Lucene, it excels at full-text search. Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short — typically one second.


2 Answers

You can try adding the ids filter into a bool/must_not clause of your complex query and see how it behaves.

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
              ...                <--- your other "must" constraints
          ],
          "must_not": [
            {
              "ids": {
                "values": [ "id1", "id2", "id3" ]  <--- your list of ids to exclude
              }
            }
          ]
        }
      }
    }
  }
}
like image 147
Val Avatar answered Oct 12 '22 03:10

Val


Use the lookup feature of the Terms query: Terms lookup mechanism

When it’s needed to specify a terms filter with a lot of terms it can be beneficial to fetch those term values from a document in an index. A concrete example would be to filter tweets tweeted by your followers. Potentially the amount of user ids specified in the terms filter can be a lot. In this scenario it makes sense to use the terms filter’s terms lookup mechanism.

like image 14
Roeland Van Heddegem Avatar answered Oct 12 '22 03:10

Roeland Van Heddegem