Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the ideal bulk size formula in ElasticSearch?

Tags:

I believe there should be a formula to calculate bulk indexing size in ElasticSearch. Probably followings are the variables of such a formula.

  • Number of nodes
  • Number of shards/index
  • Document size
  • RAM
  • Disk write speed
  • LAN speed

I wonder If anyone know or use a mathematical formula. If not, how people decide their bulk size? By trial and error?

like image 505
shyos Avatar asked Aug 28 '13 13:08

shyos


2 Answers

There is no golden rule for this. Extracted from the doc:

There is no “correct” number of actions to perform in a single bulk call. You should experiment with different settings to find the optimum size for your particular workload.

like image 176
moliware Avatar answered Sep 26 '22 05:09

moliware


Read ES bulk API doc carefully: https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#_using_and_sizing_bulk_requests

  • Try with 1 KiB, try with 20 KiB, then with 10 KiB, ... dichotomy
  • Use bulk size in KiB (or equivalent), not document count !
  • Send data in bulk (no streaming), pass redundant info API url if you can
  • Remove superfluous whitespace in your data if possible
  • Disable search index updates, activate it back later
  • Round-robin across all your data nodes
like image 41
Christophe Roussy Avatar answered Sep 24 '22 05:09

Christophe Roussy