Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compression to Elasticsearch indexes

Is there a way to set default compression method to best_compression with newly created indexes in Elasticsearch?

Obviously it can be done manually after index has been created.

Based on googling, this should be achievable either by setting it in elasticsearch.yml, or by creating a custom template to set it. However, I haven't been able to get it right in elasticsearch.yml. I've tried all sorts of variations, but basically this should do it:

index.codec: best_compression

But it doesn't.

I'm also not comfortable with creating a custom template, since my goal is to get this compression to all indexes, not just those created using some specific template. But if it's the only way, so be it.

My use case is Elasticsearch with Logstash, so Logstash is the creator of these indexes. Without custom templates, setting the compression method in Logstash configs seems completely impossible. I'm running Elasticsearch version 2.2.0.

Currently I can set the compression manually just fine after closing index and executing:

curl -XPUT 'http://localhost:9200/example_index/_settings' -d '{"index":{"codec":"best_compression"}}'

And then reopening the index.

From official documentation:

index.codec

The default value compresses stored data with LZ4 compression, but this can be set to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html

like image 799
empe Avatar asked Jul 13 '16 05:07

empe


People also ask

Does Lucene compress data?

In particular in Lucene 4.1, the codec changed in order to automatically compress the document store. It works by grouping documents into blocks of 16KB and then compresses them together using LZ4, a lightweight compression algorithm.

How many indexes can Elasticsearch handle?

Indexes themselves have no limit, however shards do, the recommended amount of shards per GB of heap is 20(JVM heap - you can check on kibana stack monitoring tab), this means if you have 5GB of JVM heap, the recommended amount is 100.

What is Number_of_shards?

number_of_shards to route documents to a primary shard. See _routing field. Elasticsearch uses this value when splitting an index.


1 Answers

The index.codec setting is a node level setting and it won't be visible in the list of settings for a new index. If the index template is specifically setting the codec then that one will be used, otherwise the one at node level will be.

Also, when changing the codec for an index, only the new segments (after new indexing, changes to existent documents or segments merging) will use the new codec.

like image 61
Andrei Stefan Avatar answered Oct 05 '22 07:10

Andrei Stefan