I'm planning to use a Terms Query
with many terms (depending on the case up to 40-50k terms) in all my queries.
These terms will be fetched from another index using lookup as explained here. Elasticsearch takes them internally, so at least they won't go through the wire, but the query itself looks quite heavy.
I was wondering if the query performance will be fine. Anyway I'm planning to do a stress test, but not sure if this is going to escalate well. Someone had experience with these kind of queries or knows how Elasticsearch deals with them internally?
Thank you!
Performance after hundreds of terms will degrade fast: https://github.com/elastic/elasticsearch/issues/18829
The following is an uber thread that it was originally mentioned in: https://github.com/elastic/elasticsearch/issues/11511#issuecomment-224028056
ES will search each term individually across your shards, so as more terms are added, it bogs the cluster down. As with anything Elasticsearch though, tuning shard amounts (replicas in your case), node counts, and other configuration options might help. I'd suggest performance testing to know what you're dealing with, but don't expect anything initially.
I opened an issue in the Elasticsearch repo about this matter, and as I feared, even using lookup, if used with many terms, this kind of query gets very slow.
Also, I mentioned it in the issue, but I stress tested it and checked it myself:
filtering with around 20 thousand terms make the query quite slow (more than 500ms).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With