My cluster has an index for each day since a few months ago, 5 shards each index (the default), and I can't run queries on the whole cluster because there are too many shards (over 1000).
The document IDs are automatically generated.
How can I combine the indexes into one index, deal with conflicting ids (if conflicts are even possible), and change the types?
I am using ES version 5.2.1
In order to combine the two arrays by indices, we have to loop through them and merge as we go. Such that the first index of the first and second array together form the first index of the resultant array.
No, Elasticsearch does not support joins between indices. There is some limited join-like behaviour within an index, but this comes with a number of restrictions so it is generally recommended to denormalise your data for best performance.
Use the force merge API to force a merge on the shards of one or more indices. Merging reduces the number of segments in each shard by merging some of them together, and also frees up the space used by deleted documents. Merging normally happens automatically, but sometimes it is useful to trigger a merge manually.
Common problem that is visible only after few months of using ELK stack with filebeat
creating indices day by day. There is a few options to fix the performance issue here.
_forcemerge
First you can use _forcemerge
to limit the numer of segments inside Lucene index. Operation won't limit or merge indices but will improve the performance of Elasticsearch.
curl -XPOST 'localhost:9200/logstash-2017.07*/_forcemerge?max_num_segments=1'
This will run through the whole month indices and force merge segments. When done for every month, it should improve the Elasticsearch performance a lot. In my case CPU usage went down from 100% to 2.7%.
Unfortunately this won't solve the shards problem.
_reindex
Please read the
_reindex
documentation and backup your database before continue.
As tomas mentioned. If you want to limit number of shards or indices there is no other option than use _reindex
to merge few indices into one. This can take a while depending on the number and size of indices you have.
You can create the destination index beforehand and specify number of shards it should contain. This will ensure your final index will have the number of shards you need.
curl -XPUT 'localhost:9200/new-logstash-2017.07.01?pretty' -H 'Content-Type: application/json' -d' { "settings" : { "index" : { "number_of_shards" : 1 } } } '
If you want to limit number of shards per index you can run _reindex
one to one. In this case there should be no entries dropped as it will be exact copy but with smaller number of shards.
curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d' { "conflicts": "proceed", "source": { "index": "logstash-2017.07.01" }, "dest": { "index": "logstash-v2-2017.07.01", "op_type": "create" } } '
After this operation you can remove old index and use new one. Unfortunately if you want to use old name you need to _reindex
one more time with new name. If you decide to do that
DON'T FORGET TO SPECIFY NUMBER OF SHARDS FOR THE NEW INDEX! By default it will fall back to 5.
curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d' { "conflicts": "proceed", "source": { "index": "logstash-2017.07*" }, "dest": { "index": "logstash-2017.07", "op_type": "create" } } '
When done you should have all entries from logstash-2017.07.01
to logstash-2017.07.31
merged into logstash-2017.07
. Note that the old indices must be deleted manually.
Some of the entries can be overwritten or merged, depending which conflicts
and op_type
option you choose.
You can set up index template that will be used every time new logstash
index is created.
curl -XPUT 'localhost:9200/_template/template_logstash?pretty' -H 'Content-Type: application/json' -d' { "template" : "logstash-*", "settings" : { "number_of_shards" : 1 } } '
This will ensure every new index created that match logstash-
in name to have only one shard.
If you don't stream too many logs you can set up your logstash
to group logs by month.
// file: /etc/logstash/conf.d/30-output.conf output { elasticsearch { hosts => ["localhost"] manage_template => false index => "%{[@metadata][beat]}-%{+YYYY.MM}" document_type => "%{[@metadata][type]}" } }
It's not easy to fix initial misconfiguration! Good luck with optimising your Elastic search!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With