I am planning on extracting (essentially scraping, with permission) some data from a web-page and store that in elasticsearch (you know, for search).
While I have permission to scrape the data from the site,
When I store this in es, I am planning to put this into one index and into a mapping type, say thing
.
However, over time, the source (the HTML web page) is likely to change as they add/remove/change content of some of these entries. Since there are no identifiers in the source, I can't easily identify new ones (and even worse, deleted ones or changed ones).
I want to keep my es index up to date and what I am thinking is some sort of a blue-green mechanism:
index-prod
and the new one built by the process is index-rc
(release candidate)index-rc
based on some heuristics (a flexible velocity check on the number of entries, sample queries that we know should work etc.)I am planning on hosting the elasticsearch cluster using AWS Elastisearch Service and could possibly concote something using Route 53 CNAMEs (and maybe ELB?) but I wanted to know if there is a more implicit support in elasticsearch itself for doing this?
Essentially, I want to swap one index's data for another.
You don't need to swap the entire data between indexes... if I get it right, you can use Aliases to change from the actual to the next index version.
To slowly change the queries endpoint, I also suppose that some Load Balancer, like nginx, is the best solution. There are many cases about this on the web.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With