Regarding elastic search memory usage

Tags:

elasticsearch

I am currently using elasticsearch 0.9.19. The machine I am using is having around 300GB disk space and the RAM on it is around 23GB. I have allocated around 10GB of ram to elastic search. My operations are write intensive. They are around 1000docs/s. I am only running elastic search on the machine and no other process. The doc size is not large. They are small only with not more than 10 fields. The elastic search is being run only on one machine with 1 shard and 0 replicas.

The memory used, starts increasing very rapidly when I am sending 1000 docs/s. Though I have allocated 10GB RAM only to elastic search but still almost 21 GB ram gets consumed and eventually the elastic search process results in out of heap space. Later I need to clear the OS cache to free all the memory. Even when I stop sending elastic search, 1000docs/s then also the memory does not get automatically cleared.

So For e.g If I am running elastic search with around 1000doc/s write operations then, I found that it went to 18 GB Ram usage very quickly and later when I reduced my write operations to only 10 docs/s then also the memory used still shows around 18 GB. Which I think should come down with decrease in the number of write operations. I am using Bulk API for performing my write operations with size of 100 docs per query. The data is coming from 4 machines when the write operations are around 1000docs/sec

These are the figures which I am getting after doing top

Mem: 24731664k total, 18252700k used, 6478964k free, 322492k buffers

Swap: 4194296k total, 0k used, 4194296k free, 8749780k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1004 elastics 20 0 10.7g 8.3g 10m S 1 35.3 806:28.69 java

Please tell if any one has any idea, what could be the reason for this. I have to stop my application because of this issue. I think I am missing any configuration. I have already read all the cache related documentations for the elastic search over here http://www.elasticsearch.org/guide/reference/index-modules/cache.html

I have also tried clearing cache using clear cache API and also tried flush api. But didnot got any improvement.

Thanks in advance.

244

asked Nov 25 '12 14:11

user1385154

2 Answers

To summarize the answer on the mailing list thread: the problem was that the Ruby client wasn't able to throttle its inserts, and Lucene memory usage does grow as large numbers of documents are added. I think there may also be an issue with commit frequency: it's important to commit from time to time in order to flush newly added documents to disk. Is the OP still having the problem? If not, could you post the solution?

answered Oct 14 '22 16:10

Mike Sokolov

I think that your ingesting is to heavy for the cluster capacity. Then data keeps stacked in memory. You should monitor your disk I/O, it should be the bottleneck.

You should then :

slower the ingestion (you could maybe use a stronger queue like Kafka, rabbit MQ etc..., or use the persisted queue system of logstash)
use quick SSD hard drive to speed up IO capacity
add more nodes (and adjust shards of your indices) for a better I/O parallelism

As small optimization, you can improve performance a little by :

increasing the refresh_interval. This action consume RAM, so avoid refreshing when you're in heavy ingesting node could help a lot
if you are doing a first ingestion in your index, try to remove all replicas in ingesting phase, and re add the replicas after ingestion

answered Oct 14 '22 18:10

Jaycreation

Related questions
                            
                                How to query elasticsearch for greater than and less than?
                            
                                elasticsearch: create index with mappings using javascript
                            
                                Fluentd vs Kafka
                            
                                Adding additional fields to ElasticSearch terms aggregation
                            
                                elasticsearch / kibana 4: field exists but is not equal to a value
                            
                                Elasticsearch list indices sorted by name
                            
                                Elasticsearch PHP client throwing exception "No alive nodes found in your cluster"
                            
                                Courier Fetch: shards failed
                            
                                elasticsearch set sort order using querystring
                            
                                How to make query_string search exact phrase in ElasticSearch
                            
                                Creating DataFrame from ElasticSearch Results
                            
                                Elasticsearch: Job for elasticsearch.service failed
                            
                                Elasticsearch OutOfMemoryError Java heap space
                            
                                ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?
                            
                                Filter out metadata fields and only return source fields in elasticsearch
                            
                                running Elastic Search as a Windows service
                            
                                ElasticSearch vs SQL Full Text Search [closed]
                            
                                Elasticsearch relationship mappings (one to one and one to many)
                            
                                How do I create a stacked graph of HTTP codes in Kibana?
                            
                                is there any way to import a json file(contains 100 documents) in elasticsearch server.?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With