I'm new to elasticsearch and have been using it to store scraped data from the web and passing it to kibana for analysis.
However I keep needing to tune my mappings. From what I gather I can't change mappings of existing fields on the fly. So far every time I've had to make an adjustment to my mappings I've had to delete the index, create a new mapping and then kick of my crawl again (painful!) .
So what I'd like to do is easily back up the data from existing crawls and the mappings separately, so I can restore perhaps just the data as I incrementally adjust my mappings.
I've looked at using elasticdump, but whilst it seems fairly obvious that I can create json outputs of mappings and data, can I also use elasticdump to reimport the data and/or mappings from those created and archived json files?
Thanks for any thoughts / advice!
In the end I used elasticdump and it was really easy to use and straightforward.
I haven't yet had to recreate my elasticsearch database with a different mapping, so haven't tested the full loop as I thought I would need. But I can report that elasticdump allows you to export the whole dataset in json format (complete with per entry index entries) and the mappings separately. You can also export the analyzers separately but I had no need to.
Now with these two files - if I needed to create a new instance but with different mappings and analyzer settings, I believe I can adjust the mappings file manually, import the mappings to a new database and then import the data. Then when I use kibana for visualisation I just have to pick up the new index.
A few commands for ease of reference:
npm install elasticdump -g
Then for data:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_data.json \
--type=data
Then for mappings:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
--type=mapping
When I get round to testing this for a new environment I'll be able to confirm actual testing of the reimport but thought I would update for now in case anyone else is also looking at options.
(FYI you can also export from one ES database and output directly to another)...
you can take backup of elasticsearch data using snapshot and restore api provided by elasticsearch itself. Example:
PUT /_snapshot/my_backup/snapshot_1
{
"indices": "index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false
}
Here is a blogpost explaining "How to take backup of elasticsearch data" and If you are want to see the backup and restore in action than watch this video tutorial
There are couple ways to change mapping in Elasticsearch with no downtime and no complete rebuild of your index.
The latest, which is using new reindex
api may be found here. And here is an article with simple example on how to use it.
Another way (the official one) is to use scroll
and bulk
api's in order to reindex from one index to a newer one. The catch is to use aliases
in order to stay with the same "index" name and still change its mappings. Which is described here. The key concept is to create an alias for every index and use the alias to index data and not the real index name. Once you need to change your mappings you create a new index with the new mappings and once all data is reindexed to the new index you change the alias to point to the new index.
Regarding saving an index, I don't know of an official way to save an index in Elasticsearch without its mappings.
Anyway, if you want your data to be moved to another cluster in another location and then have it reindexed. You may find this article about snapshot/restore Elasticsearch data to/from Amazon S3 helpful.
And this link leads to Elasticsearch documentation about how to snapshot/restore to/from other file-systems.
Hope I have managed to help!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With