Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A better approach to exclude large list of items in Elasticsearch

I use terms query to exclude a list of 100,000 or more items, as the terms query by default allows only 65,536 terms, ES throws following error:

The number of terms [115687] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting.

One way to solve my problem is to increase the max_terms_count, but I suspect it will be slow.

Another solution would be to exclude those items in PHP which also would be too resource consuming.

Is there a better way to exclude large list of items from ES search result?

like image 465
Salim Ibrohimi Avatar asked Apr 08 '21 10:04

Salim Ibrohimi


People also ask

Which is used to improve the performance of Elasticsearch?

For high performance of Elasticsearch, you should mainly focus on cache, disk space, CPUs, and RAM. The reason you've chosen Elasticsearch instead of a traditional database is probably that you're dealing with a humongous amount of data and you want quick access. And hardware plays a very important role.

How do I get more than 10000 hits in Elasticsearch?

By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index. max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.


1 Answers

  1. For rare cases I suggest to use a client-oriented solution: split exceptions list into two: the first 65k items should be processed by ES, the rest - in PHP.
  2. Performance-oriented solution: limit the exclusion list to 65k (client-side limitation)
like image 200
Dmitry Bordun Avatar answered Sep 26 '22 00:09

Dmitry Bordun