Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of Terms Query with many elements

I'm planning to use a Terms Query with many terms (depending on the case up to 40-50k terms) in all my queries.

These terms will be fetched from another index using lookup as explained here. Elasticsearch takes them internally, so at least they won't go through the wire, but the query itself looks quite heavy.

I was wondering if the query performance will be fine. Anyway I'm planning to do a stress test, but not sure if this is going to escalate well. Someone had experience with these kind of queries or knows how Elasticsearch deals with them internally?

Thank you!

like image 982
Antonio Val Avatar asked Mar 13 '17 06:03

Antonio Val


2 Answers

Performance after hundreds of terms will degrade fast: https://github.com/elastic/elasticsearch/issues/18829

The following is an uber thread that it was originally mentioned in: https://github.com/elastic/elasticsearch/issues/11511#issuecomment-224028056

ES will search each term individually across your shards, so as more terms are added, it bogs the cluster down. As with anything Elasticsearch though, tuning shard amounts (replicas in your case), node counts, and other configuration options might help. I'd suggest performance testing to know what you're dealing with, but don't expect anything initially.

like image 142
ryanlutgen Avatar answered Sep 19 '22 14:09

ryanlutgen


I opened an issue in the Elasticsearch repo about this matter, and as I feared, even using lookup, if used with many terms, this kind of query gets very slow.

Also, I mentioned it in the issue, but I stress tested it and checked it myself:

filtering with around 20 thousand terms make the query quite slow (more than 500ms).

like image 20
Antonio Val Avatar answered Sep 18 '22 14:09

Antonio Val