Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regarding elastic search memory usage

I am currently using elasticsearch 0.9.19. The machine I am using is having around 300GB disk space and the RAM on it is around 23GB. I have allocated around 10GB of ram to elastic search. My operations are write intensive. They are around 1000docs/s. I am only running elastic search on the machine and no other process. The doc size is not large. They are small only with not more than 10 fields. The elastic search is being run only on one machine with 1 shard and 0 replicas.

The memory used, starts increasing very rapidly when I am sending 1000 docs/s. Though I have allocated 10GB RAM only to elastic search but still almost 21 GB ram gets consumed and eventually the elastic search process results in out of heap space. Later I need to clear the OS cache to free all the memory. Even when I stop sending elastic search, 1000docs/s then also the memory does not get automatically cleared.

So For e.g If I am running elastic search with around 1000doc/s write operations then, I found that it went to 18 GB Ram usage very quickly and later when I reduced my write operations to only 10 docs/s then also the memory used still shows around 18 GB. Which I think should come down with decrease in the number of write operations. I am using Bulk API for performing my write operations with size of 100 docs per query. The data is coming from 4 machines when the write operations are around 1000docs/sec

These are the figures which I am getting after doing top

Mem: 24731664k total, 18252700k used, 6478964k free, 322492k buffers

Swap: 4194296k total, 0k used, 4194296k free, 8749780k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1004 elastics 20 0 10.7g 8.3g 10m S 1 35.3 806:28.69 java

Please tell if any one has any idea, what could be the reason for this. I have to stop my application because of this issue. I think I am missing any configuration. I have already read all the cache related documentations for the elastic search over here http://www.elasticsearch.org/guide/reference/index-modules/cache.html

I have also tried clearing cache using clear cache API and also tried flush api. But didnot got any improvement.

Thanks in advance.

like image 244
user1385154 Avatar asked Nov 25 '12 14:11

user1385154


People also ask

Does Elasticsearch use a lot of memory?

The Elasticsearch process is very memory intensive. Elasticsearch uses a JVM (Java Virtual Machine), and close to 50% of the memory available on a node should be allocated to JVM. The JVM machine uses memory because the Lucene process needs to know where to look for index values on disk.

Is Elasticsearch in memory or on disk?

However, Elasticsearch is effectively an on-disk service (writes index directly to disk, removes when asked).

What is heap memory in Elasticsearch?

Overview. The heap size is the amount of RAM allocated to the Java Virtual Machine of an Elasticsearch node. As a general rule, you should set -Xms and -Xmx to the SAME value, which should be 50% of your total available RAM subject to a maximum of (approximately) 31GB.


2 Answers

To summarize the answer on the mailing list thread: the problem was that the Ruby client wasn't able to throttle its inserts, and Lucene memory usage does grow as large numbers of documents are added. I think there may also be an issue with commit frequency: it's important to commit from time to time in order to flush newly added documents to disk. Is the OP still having the problem? If not, could you post the solution?

like image 73
Mike Sokolov Avatar answered Oct 14 '22 16:10

Mike Sokolov


I think that your ingesting is to heavy for the cluster capacity. Then data keeps stacked in memory. You should monitor your disk I/O, it should be the bottleneck.

You should then :

  • slower the ingestion (you could maybe use a stronger queue like Kafka, rabbit MQ etc..., or use the persisted queue system of logstash)
  • use quick SSD hard drive to speed up IO capacity
  • add more nodes (and adjust shards of your indices) for a better I/O parallelism

As small optimization, you can improve performance a little by :

  • increasing the refresh_interval. This action consume RAM, so avoid refreshing when you're in heavy ingesting node could help a lot
  • if you are doing a first ingestion in your index, try to remove all replicas in ingesting phase, and re add the replicas after ingestion
like image 44
Jaycreation Avatar answered Oct 14 '22 18:10

Jaycreation