Is there a way to set default compression method to best_compression with newly created indexes in Elasticsearch?
Obviously it can be done manually after index has been created.
Based on googling, this should be achievable either by setting it in elasticsearch.yml, or by creating a custom template to set it. However, I haven't been able to get it right in elasticsearch.yml. I've tried all sorts of variations, but basically this should do it:
index.codec: best_compression
But it doesn't.
I'm also not comfortable with creating a custom template, since my goal is to get this compression to all indexes, not just those created using some specific template. But if it's the only way, so be it.
My use case is Elasticsearch with Logstash, so Logstash is the creator of these indexes. Without custom templates, setting the compression method in Logstash configs seems completely impossible. I'm running Elasticsearch version 2.2.0.
Currently I can set the compression manually just fine after closing index and executing:
curl -XPUT 'http://localhost:9200/example_index/_settings' -d '{"index":{"codec":"best_compression"}}'
And then reopening the index.
From official documentation:
index.codec
The default value compresses stored data with LZ4 compression, but this can be set to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.
Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
In particular in Lucene 4.1, the codec changed in order to automatically compress the document store. It works by grouping documents into blocks of 16KB and then compresses them together using LZ4, a lightweight compression algorithm.
Indexes themselves have no limit, however shards do, the recommended amount of shards per GB of heap is 20(JVM heap - you can check on kibana stack monitoring tab), this means if you have 5GB of JVM heap, the recommended amount is 100.
number_of_shards to route documents to a primary shard. See _routing field. Elasticsearch uses this value when splitting an index.
The index.codec
setting is a node level setting and it won't be visible in the list of settings for a new index. If the index template is specifically setting the codec
then that one will be used, otherwise the one at node level will be.
Also, when changing the codec for an index, only the new segments (after new indexing, changes to existent documents or segments merging) will use the new codec.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With