Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search limit results

In MySQL I can do something like:

  SELECT id FROM table WHERE field = 'foo' LIMIT 5

If the table has 10,000 rows, then this query is way way faster than if I left out the LIMIT part.

In ElasticSearch, I've got the following:

 {
    "query":{
       "fuzzy_like_this_field":{
          "body":{
             "like_text":"REALLY LONG (snip) TEXT HERE",
             "max_query_terms":1,
             "min_similarity":0.95,
             "ignore_tf":true
          }
       }
    }
 }

When I run this search, it takes a few seconds, whereas mysql can return results for the same query in far, far less time.

If I pass in the size parameter (set to 1), it successfully only returns 1 result, but the query itself isn't any faster than if I had set the size to unlimited and returned all the results. I suspect the query is being run in its entirety and only 1 result is being returned after the query is done processing. This means the "size" attribute is useless for my purposes.

Is there any way to have my search stop searching as soon as it finds a single record that matches the fuzzy search, rather than processing every record in the index before returning a response? Am I misunderstanding something more fundamental about this?

Thanks in advance.

like image 738
Jemaclus Avatar asked Dec 20 '11 23:12

Jemaclus


People also ask

How do I get more than 10000 results Elasticsearch?

By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index. max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.

How much data can Elasticsearch handle?

Though there is technically no limit to how much data you can store on a single shard, Elasticsearch recommends a soft upper limit of 50 GB per shard, which you can use as a general guideline that signals when it's time to start a new index.

How do I set Elasticsearch limits?

Elasticsearch indices have an index module called max_result_window . You can find it in the documentation under dynamic index settings. The maximum value of from + size for searches to this index. Defaults to 10000 .

How many search requests can Elasticsearch handle?

Note: The bulk queue on each node can hold between 50 and 200 requests, depending on which Elasticsearch version you are using. When the queue is full, new requests are rejected. For more information, see Thread pool on the Elasticsearch website.


1 Answers

You are correct the query is being ran entirely. Queries by default return data sorted by score, so your query is going to score each document. The docs state that the fuzzy query isn't going to scale well, so might want to consider other queries.

A limit filter might give you similar behavior to what your looking for.

A limit filter limits the number of documents (per shard) to execute on

To replicate mysql field='foo' try using a term filter. You should use filters when you don't care about scoring, they are faster and cache-able.

like image 64
Andy Avatar answered Oct 13 '22 00:10

Andy