Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a smarter way to reindex elasticsearch?

I ask because our search is in a state of flux as we work things out, but each time we make a change to the index (change tokenizer or filter, or number of shards/replicas), we have to blow away the entire index and re-index all our Rails models back into Elasticsearch ... this means we have to factor in downtime to re-index all our records.

Is there a smarter way to do this that I'm not aware of?

like image 258
concept47 Avatar asked Dec 13 '12 00:12

concept47


People also ask

Does reindex copy mapping?

Reindex does not copy the settings from the source or its associated template. Mappings, shard counts, replicas, and so on must be configured ahead of time.

Does Elasticsearch automatically create index?

By default, Elasticsearch has a feature that will automatically create indices. Simply pushing data into a non-existing index will cause that index to be created with mappings inferred from the data.


1 Answers

I think @karmi makes it right. However let me explain it a bit simpler. I needed to occasionally upgrade production schema with some new properties or analysis settings. I recently started to use the scenario described below to do live, constant load, zero-downtime index migrations. You can do that remotely.

Here are steps:

Assumptions:

  • You have index real1 and aliases real_write, real_read pointing to it,
  • the client writes only to real_write and reads only from real_read ,
  • _source property of document is available.

1. New index

Create real2 index with new mapping and settings of your choice.

2. Writer alias switch

Using following bulk query switch write alias.

curl -XPOST 'http://esserver:9200/_aliases' -d ' {     "actions" : [         { "remove" : { "index" : "real1", "alias" : "real_write" } },         { "add" : { "index" : "real2", "alias" : "real_write" } }     ] }' 

This is atomic operation. From this time real2 is populated with new client's data on all nodes. Readers still use old real1 via real_read. This is eventual consistency.

3. Old data migration

Data must be migrated from real1 to real2, however new documents in real2 can't be overwritten with old entries. Migrating script should use bulk API with create operation (not index or update). I use simple Ruby script es-reindex which has nice E.T.A. status:

$ ruby es-reindex.rb http://esserver:9200/real1 http://esserver:9200/real2 

UPDATE 2017 You may consider new Reindex API instead of using the script. It has lot of interesting features like conflicts reporting etc.

4. Reader alias switch

Now real2 is up to date and clients are writing to it, however they are still reading from real1. Let's update reader alias:

curl -XPOST 'http://esserver:9200/_aliases' -d ' {     "actions" : [         { "remove" : { "index" : "real1", "alias" : "real_read" } },         { "add" : { "index" : "real2", "alias" : "real_read" } }     ] }' 

5. Backup and delete old index

Writes and reads go to real2. You can backup and delete real1 index from ES cluster.

Done!

like image 120
gertas Avatar answered Sep 20 '22 17:09

gertas