Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the maximum Elasticsearch document size?

I read notes about Lucene being limited to 2Gb documents. Are there any additional limitations on the size of documents that can be indexed in Elasticsearch?

like image 334
Asimov4 Avatar asked Mar 03 '15 20:03

Asimov4


People also ask

What is the maximum size of Elasticsearch index?

There are no index size limit. Index max size limited by available hardware (in my case with RAM) and some shard-level limits (2.1B documents per shard and recommended shard size up to 20-40GB (soft limit to ensure reasonably fast index relocation speed over the net)).

How many documents can Elasticsearch handle?

You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index.

Is Elasticsearch considered big data?

Elasticsearch is the main product of a company called 'Elastic'. It is used for web search, log analysis, and big data analytics. Often compared with Apache Solr, both depend on Apache Lucene for low-level indexing and analysis.


2 Answers

Lucene uses a byte buffer internally that uses 32bit integers for addressing. By definition this limits the size of the documents. So 2GB is max in theory.

In ElasticSearch:

There is a max http request size in the ES GitHub code, and it is set against Integer.MAX_VALUE or 2^31-1. So, basically, 2GB is the maximum document size for bulk indexing over HTTP. And also to add to it, ES does not process an HTTP request until it completes.

Good Practices:

  • Do not use a very large java heap if you can help it: set it only as large as is necessary (ideally no more than half of the machine’s RAM) to hold the overall maximum working set size for your usage of Elasticsearch. This leaves the remaining (hopefully sizable) RAM for the OS to manage for IO caching.
  • In client side, always use the bulk api, which indexes multiple documents in one request, and experiment with the right number of documents to send with each bulk request. The optimal size depends on many factors, but try to err in the direction of too few rather than too many documents. Use concurrent bulk requests with client-side threads or separate asynchronous requests.

For further study refer to these links:

  1. Performance considerations for elasticsearch indexing

  2. Document maximum size for bulk indexing over HTTP

like image 68
Utsav Dawn Avatar answered Sep 18 '22 16:09

Utsav Dawn


Think things have changed slightly over the years with Elasticsearch. In the 7.x documentation referenced here - General Recommendations:

Given that the default http.max_content_length is set to 100MB, Elasticsearch will refuse to index any document that is larger than that. You might decide to increase that particular setting, but Lucene still has a limit of about 2GB.

So it would seem that ES has a limit of ~100MB and Lucene's is 2GB as the other answer stated.

like image 43
slm Avatar answered Sep 17 '22 16:09

slm