Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NEST Elasticsearch Reindex examples

my objective is to reindex an index with 10 million shards for the purposes of changing field mappings to facilitate significant terms analysis.

My problem is that I am having trouble using the NEST library to perform a re-index, and the documentation is (very) limited. If possible I need an example of the following in use:

http://nest.azurewebsites.net/nest/search/scroll.html

http://nest.azurewebsites.net/nest/core/bulk.html

like image 429
Gillespie Avatar asked Sep 18 '14 08:09

Gillespie


People also ask

How do I reindex a large index in Elasticsearch?

You can avoid stopping logstash if you changing the ILM policy to force a rollover to a new index with a new date then you would not have to stop logstash. Once data is being written to the new index you can start reindexing. Valdate reindex successful.

Can you reindex an index?

Reindexing eliminates the original index and creates a new index in the process of new mapping and some downtime. For a business, this is critical. ElasticSearch, however, has a problem-solving, index aliases. The alias is like a symbolic reference capable of referring to one or more indices.

Why do we need to reindex Elasticsearch?

Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings.


2 Answers

NEST provides a nice Reindex method you can use, although the documentation is lacking. I've used it in a very rough-and-ready fashion with this ad-hoc WinForms code.

    private ElasticClient client;
    private double count;

    private void reindex_Completed()
    {
        MessageBox.Show("Done!");
    }

    private void reindex_Next(IReindexResponse<object> obj)
    {
        count += obj.BulkResponse.Items.Count();
        var progress = 100 * count / (double)obj.SearchResponse.Total;
        progressBar1.Value = (int)progress;
    }

    private void reindex_Error(Exception ex)
    {
        MessageBox.Show(ex.ToString());
    }

    private void button1_Click(object sender, EventArgs e)
    {
        count = 0;

        var reindex = client.Reindex<object>(r => r.FromIndex(fromIndex.Text).NewIndexName(toIndex.Text).Scroll("10s"));

        var o = new ReindexObserver<object>(onError: reindex_Error, onNext: reindex_Next, completed: reindex_Completed);
        reindex.Subscribe(o);
    }

And I've just found the blog post that showed me how to do it: http://thomasardal.com/elasticsearch-migrations-with-c-and-nest/

like image 111
batwad Avatar answered Oct 23 '22 01:10

batwad


Unfortunately the NEST implementation is not quite what I expected. In my opinion it's a bit over-engineered for possibly the most common use case.

Alot of people just want to update their mappings with zero downtime...

In my case - I had already taken care of creating the index with all its settings and mappings, but NEST insists that it must create a new index when reindexing. That among many other things. Too many other things.

I found it much less complicated to just implement directly - since NEST already has Search, Scroll, and Bulk methods. (this is adopted from NEST's implementation):

// Assuming you have already created and setup the index yourself
public void Reindex(ElasticClient client, string aliasName, string currentIndexName, string nextIndexName)
{
    Console.WriteLine("Reindexing documents to new index...");
    var searchResult = client.Search<object>(s => s.Index(currentIndexName).AllTypes().From(0).Size(100).Query(q => q.MatchAll()).SearchType(SearchType.Scan).Scroll("2m"));
    if (searchResult.Total <= 0)
    {
        Console.WriteLine("Existing index has no documents, nothing to reindex.");
    }
    else
    {
        var page = 0;
        IBulkResponse bulkResponse = null;
        do
        {
            var result = searchResult;
            searchResult = client.Scroll<object>(s => s.Scroll("2m").ScrollId(result.ScrollId));
            if (searchResult.Documents != null && searchResult.Documents.Any())
            {
                searchResult.ThrowOnError("reindex scroll " + page);
                bulkResponse = client.Bulk(b =>
                {
                    foreach (var hit in searchResult.Hits)
                    {
                        b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
                    }

                    return b;
                }).ThrowOnError("reindex page " + page);
                Console.WriteLine("Reindexing progress: " + (page + 1) * 100);
            }

            ++page;
        }
        while (searchResult.IsValid && bulkResponse != null && bulkResponse.IsValid && searchResult.Documents != null && searchResult.Documents.Any());
        Console.WriteLine("Reindexing complete!");
    }

    Console.WriteLine("Updating alias to point to new index...");
    client.Alias(a => a
        .Add(aa => aa.Alias(aliasName).Index(nextIndexName))
        .Remove(aa => aa.Alias(aliasName).Index(currentIndexName)));

    // TODO: Don't forget to delete the old index if you want
}

And the ThrowOnError extension method in case you want it:

public static T ThrowOnError<T>(this T response, string actionDescription = null) where T : IResponse
{
    if (!response.IsValid)
    {
        throw new CustomExceptionOfYourChoice(actionDescription == null ? string.Empty : "Failed to " + actionDescription + ": " + response.ServerError.Error);
    }

    return response;
}
like image 43
Ben Wilde Avatar answered Oct 23 '22 01:10

Ben Wilde