Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"[circuit_breaking_exception] [parent]" Data too large, data for "[<http_request>]" would be error

After smoothly working for more than 10 months, I start getting this error on production suddenly while doing simple search queries.

{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[parent] Data too large, data for [<http_request>] would be [745522124/710.9mb], which is larger than the limit of [745517875/710.9mb]",
        "bytes_wanted" : 745522124,
        "bytes_limit" : 745517875
      }
    ],
    "type" : "circuit_breaking_exception",
    "reason" : "[parent] Data too large, data for [<http_request>] would be [745522124/710.9mb], which is larger than the limit of [745517875/710.9mb]",
    "bytes_wanted" : 745522124,
    "bytes_limit" : 745517875
  },
  "status" : 503
}

Initially, I was getting this error while doing simple term queries when I got this circuit_breaking_exception error, To debug this I tried _cat/health query on elasticsearch cluster, but still, the same error, even the simplest query localhost:9200 is giving the same error Not sure what happens to the cluster suddenly. Her is my circuit breaker status:

"breakers" : {
        "request" : {
          "limit_size_in_bytes" : 639015321,
          "limit_size" : "609.4mb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 639015321,
          "limit_size" : "609.4mb",
          "estimated_size_in_bytes" : 406826332,
          "estimated_size" : "387.9mb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 1065025536,
          "limit_size" : "1015.6mb",
          "estimated_size_in_bytes" : 560,
          "estimated_size" : "560b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 1065025536,
          "limit_size" : "1015.6mb",
          "estimated_size_in_bytes" : 146387859,
          "estimated_size" : "139.6mb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 745517875,
          "limit_size" : "710.9mb",
          "estimated_size_in_bytes" : 553214751,
          "estimated_size" : "527.5mb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }

I found a similar issue hereGithub Issue that suggests increasing circuit breaker memory or disabling the same. But I am not sure what to choose. Please help!

Elasticsearch Version 6.3

like image 511
Raghu Chahar Avatar asked May 18 '20 13:05

Raghu Chahar


2 Answers

After some more research finally, I found a solution for this i.e

  1. We should not disable circuit breaker as it might result in OOM error and eventually might crash elasticsearch.
  2. dynamically increasing circuit breaker memory percentage is good but it is also a temporary solution because at the end after solution increased percentage might also fill up.
  3. Finally, we have a third option i.e increase overall JVM heap size which is 1GB by default but as recommended it should be around 30-32 GB on production, also it should be less than 50% of available total memory.

For more info check this for good JVM memory configurations of elasticsearch on production, Heap: Sizing and Swapping

like image 122
Raghu Chahar Avatar answered Nov 15 '22 00:11

Raghu Chahar


In my case I have an index with large documents, each document has ~30 KB and more than 130 fields (nested objects, arrays, dates and ids). and I was searching all fields using this DSL query:

query_string: {
    query: term,
    analyze_wildcard: true,
    fields: ['*'], // search all fields
    fuzziness: 'AUTO'
}

Since full-text searches are expensive. Searching through multiple fields at once is even more expensive. Expensive in terms of computing power, not storage.

Therefore:

The more fields a query_string or multi_match query targets, the slower it is. A common technique to improve search speed over multiple fields is to copy their values into a single field at index time, and then use this field at search time.

please refer to ELK docs that recommends searching as few fields as possible with the help of copy-to directive.

After I changed my query to search one field:

    query_string: {
        query: term,
        analyze_wildcard: true,
        fields: ['search_field'] // search in one field
    }

everything worked like a charm.

like image 1
BaDr Amer Avatar answered Nov 14 '22 22:11

BaDr Amer