I have added new mappings (mainly not_analyzed versions of existing fields) I now have to figure out how to reindex the existing data. I have tried following the guide on elastic search website but that is just too confusing. I have also tried using plugins (elasticsearch-reindex, allegro/elasticsearch-reindex-tool). I have looked at ElasticSearch - Reindexing your data with zero downtime which is a similar question. I was hoping to not have to rely on external tools (if possible) and try and use bulk API (as with original insert)
I could easily rebuild the whole index as it's a read only data really but that wont really work in the long term if I should want to add more fields etc etc when I'm in production with it. I wondered if there was anyone who knows of an easy to understand/follow solution or steps for a relative novice to ES. I'm on version 2 and using Windows.
Overview. Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings.
Description. REINDEX rebuilds an index using the data stored in the index's table, replacing the old copy of the index. There are several scenarios in which to use REINDEX: An index has become corrupted, and no longer contains valid data.
Important Note on Index Auto-CreationBy default, Elasticsearch has a feature that will automatically create indices. Simply pushing data into a non-existing index will cause that index to be created with mappings inferred from the data.
Re-indexing means to read the data, delete the data in elasticsearch and ingest the data again. There is no such thing like "change the mapping of existing data in place." All the re-indexing tools you mentioned are just wrappers around read->delete->ingest.
You can always adjust the mapping for new indices and add fields later. All the new fields will be indexed with respect to this mapping. Or use dynamic mapping if you are not in control of the new fields.
Have a look at Change default mapping of string to "not analyzed" in Elasticsearch to see how to use dynamic mapping to get not_analyzed fields of strings.
Re-indexing is very expensive. Better way is to create a new index and drop the old one. To achieve this with zero downtime, use index alias for all your customers. Think of an index called "data-version1". In steps:
It's good practice to always use aliases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With