Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set Request Timeout in Elastic Search for bulk loads [duplicate]

I wanted to set the request time to 20 sec or more in Elasticsearch Bulk uploads. Default time is set to 10 sec and my Warning message days it takes 10.006 sec. And, right after displaying the waring the execution is throwing an error

Now, I wanted to set the Request Timeout either for every request taking input from user or any value set by default.

Error Message:

    WARNING:elasticsearch:HEAD /opportunityci/predictionsci [status:404 request:0.080s]
validated the index and mapping...!
WARNING:elasticsearch:POST http://192.168.204.154:9200/_bulk [status:N/A request:10.003s]
Traceback (most recent call last):
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 94, in perform_request
    response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 640, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/util/retry.py", line 238, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 595, in urlopen
    chunked=chunked)
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 395, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/urllib3/connectionpool.py", line 315, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
ReadTimeoutError: HTTPConnectionPool(host='192.168.204.154', port='9200'): Read timed out. (read timeout=10)
ERROR:DataScience:init exception : Traceback (most recent call last):
  File "/Users/adaggula/Documents/workspace/LatestDemo/demo/com/ci/dataScience/engine/Driver.py", line 194, in <module>
    sample.persist(finalResults)
  File "/Users/adaggula/Documents/workspace/LatestDemo/demo/com/ci/dataScience/ES/sample.py", line 68, in persist
    res = helpers.bulk(client,data,stats_only=True)
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/Users/adaggula/anaconda/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 89, in _process_bulk_chunk
    raise e
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='192.168.204.154', port='9200'): Read timed out. (read timeout=10))
like image 994
Jack Daniel Avatar asked Jul 12 '16 10:07

Jack Daniel


People also ask

How to increase the default request timeout in Python for Elasticsearch?

1.Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python 2.Set the timeout per request made by the client. Taken from Elasticsearch Python docs below. # only wait for 1 second, regardless of the client's default es.cluster.health (wait_for_status='yellow', request_timeout=1)

What is the default wait time in Elasticsearch?

Defaults to 1m (one minute). This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. (Optional, string) The number of shard copies that must be active before proceeding with the operation.

Can I set a global time out for Request Time Out?

While you can specify Request time out globally, you can override this per request too. We set up a 10 node cluster with a global time out of 20 seconds. Each call on a node takes 10 seconds. So we can only try this call on 2 nodes before the max request time out kills the client call.

How to increase indexing speed with the bulk API?

The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed. Client support for bulk requests. Some of the officially supported clients provide helpers to assist with bulk requests and reindexing of documents from one index to another:


1 Answers

Use parameter 'request_timeout'

E.g.:

bulk(es, records, chunk_size=500, request_timeout=20)
like image 152
Lyncean Patel Avatar answered Sep 27 '22 17:09

Lyncean Patel