Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search inside multiple indices using Nest ElasticSearch?

I have two indices with the following mapping(I will shortcut their mappings):

1) AccountType mapping:

 elasticClient.CreateIndex("account", i => i
                .Settings(s => s
                          .NumberOfShards(2)
                          .NumberOfReplicas(0)
                          )
                          .Mappings(m => m
                                    .Map<AccountType>(map => map
                                               .AutoMap()
                                               .Properties(p => p
                                                    .Text(c => c
                                                           .Name(n => n.Name)
                                                           .Analyzer("standard")
                                                    )
                                                    .Text(c => c
                                                           .Name(n => n.Description)
                                                           .Analyzer("standard")
                                                    )
                                                )
                                    )
                            )
                          );

2) ProductType mapping:

 elasticClient.CreateIndex("proudct", i => i
                .Settings(s => s
                          .NumberOfShards(2)
                          .NumberOfReplicas(0)
                          )
                          .Mappings(m => m
                                    .Map<ProductType>(map => map
                                               .AutoMap()
                                               .Properties(p => p
                                                    .Text(c => c
                                                           .Name(n => n.Title)
                                                           .Analyzer("standard")
                                                    )
                                                    .Text(c => c
                                                           .Name(n => n.Description)
                                                           .Analyzer("standard")
                                                    )
                                                )
                                    )
                            )
                          );

Now I have several things I need to get them clear:

1) First is it a good idea to have one index which in my case is account and has products as nested objects, but here for each time I want to update/add new product I have to re-index(update) the whole account document?

2) My second questions is: I want to have search functionality, so if the user search by typing in a textbox I would like to get best matches for both Accounts and Products(here I will search against product's title and description plus account's name and description then getting best matches) :

So here how to search against multiple indices using Nest ElasticSeach, or if it's not possible is it a good idea to get best matches from every index, then getting best matches from both results depending on score?

PS: Here is an example for searching inside product index:

        var result = elasticClient.Search<ProductType>(s => s
                                            .Size(10)
                                            .Query(q => q
                                              .MultiMatch(m => m
                                                .Fields(f => f.Field(p => p.Title, 1.5).Field(p => p.Description, 0.8))
                                                .Operator(Operator.Or)
                                                .Query(query)
                                              )
                                            )
                                          );
like image 823
Simple Code Avatar asked Dec 23 '22 11:12

Simple Code


1 Answers

1) First is it a good idea to have one index which in my case is account and has products as nested objects, but here for each time I want to update/add new product I have to re-index(update) the whole account document?

It's generally recommended to have one type per index, and in Elasticsearch 6.0+, you can only have one type per index. If products are represented as nested objects on an account then adding a new product to an account will require updating the whole document (either in your application code, or within Elasticsearch).

2) My second questions is: I want to have search functionality, so if the user search by typing in a textbox I would like to get best matches for both Accounts and Products(here I will search against product's title and description plus account's name and description then getting best matches) :

You can search across multiple indices, check out the documentation of covariant search results; it shows an example of returning multiple different types from one index (this example will be updated for 6.0!), but it's possible to perform this across multiple indices. Here's an example:

private static void Main()
{
    var settings = new ConnectionSettings(new Uri("http://localhost:9200"))
        .InferMappingFor<AccountType>(i => i
            .IndexName("account")
        )
        .InferMappingFor<ProductType>(i => i
            .IndexName("product")
        )
        // useful for development, to make the request/response bytes
        // available on the response
        .DisableDirectStreaming()
        // indented JSON in requests/responses
        .PrettyJson()
        // log out all requests/responses
        .OnRequestCompleted(callDetails =>
        {
            if (callDetails.RequestBodyInBytes != null)
            {
                Console.WriteLine(
                    $"{callDetails.HttpMethod} {callDetails.Uri} \n" +
                    $"{Encoding.UTF8.GetString(callDetails.RequestBodyInBytes)}");
            }
            else
            {
                Console.WriteLine($"{callDetails.HttpMethod} {callDetails.Uri}");
            }

            Console.WriteLine();

            if (callDetails.ResponseBodyInBytes != null)
            {
                Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
                         $"{Encoding.UTF8.GetString(callDetails.ResponseBodyInBytes)}\n" +
                         $"{new string('-', 30)}\n");
            }
            else
            {
                Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
                         $"{new string('-', 30)}\n");
            }
        });

    var client = new ElasticClient(settings);

    if (client.IndexExists("account").Exists)
        client.DeleteIndex("account");

    client.CreateIndex("account", i => i
        .Settings(s => s
            .NumberOfShards(2)
            .NumberOfReplicas(0)
        )
        .Mappings(m => m
            .Map<AccountType>(map => map
                .AutoMap()
                .Properties(p => p
                    .Text(c => c
                        .Name(n => n.Name)
                        .Analyzer("standard")
                    )
                    .Text(c => c
                        .Name(n => n.Description)
                        .Analyzer("standard")
                   )
                )
            )
        )
    );

    if (client.IndexExists("product").Exists)
        client.DeleteIndex("product");

    client.CreateIndex("product", i => i
        .Settings(s => s
            .NumberOfShards(2)
            .NumberOfReplicas(0)
        )
        .Mappings(m => m
            .Map<ProductType>(map => map
                .AutoMap()
                .Properties(p => p
                    .Text(c => c
                        .Name(n => n.Title)
                        .Analyzer("standard")
                    )
                    .Text(c => c
                        .Name(n => n.Description)
                        .Analyzer("standard")
                   )
                )
            )
        )
    );

    client.IndexMany(new[] {
        new AccountType { Name = "Name 1", Description = "Description 1" },
        new AccountType { Name = "Name 2", Description = "Description 2" },
        new AccountType { Name = "Name 3", Description = "Description 3" },
        new AccountType { Name = "Name 4", Description = "Description 4" },
    });

    client.IndexMany(new[] {
        new ProductType { Title = "Title 1", Description = "Description 1" },
        new ProductType { Title = "Title 2", Description = "Description 2" },
        new ProductType { Title = "Title 3", Description = "Description 3" },
        new ProductType { Title = "Title 4", Description = "Description 4" },
    });

    var indices = Indices.Index(typeof(ProductType)).And(typeof(AccountType));

    client.Refresh(indices);

    var searchResponse = client.Search<object>(s => s
        .Index(indices)
        .Type(Types.Type(typeof(ProductType), typeof(AccountType)))
        .Query(q => (q
            .MultiMatch(m => m
                .Fields(f => f
                    .Field(Infer.Field<ProductType>(ff => ff.Title, 1.5))
                    .Field(Infer.Field<ProductType>(ff => ff.Description, 0.8))
                )
                .Operator(Operator.Or)
                .Query("Title 1")
            ) && +q
            .Term("_index", "product")) || (q
            .MultiMatch(m => m
                .Fields(f => f
                    .Field(Infer.Field<AccountType>(ff => ff.Name, 3))
                    .Field(Infer.Field<AccountType>(ff => ff.Description, 0.3))
                )
                .Operator(Operator.Or)
                .Query("Name 4")
            ) && +q
            .Term("_index", "account"))
        )
    );

    foreach (var document in searchResponse.Documents)
        Console.WriteLine($"document is a {document.GetType().Name}");
}

public class ProductType
{
    public string Title { get; set; }
    public string Description { get; set; }
}

public class AccountType
{
    public string Name { get; set; }
    public string Description { get; set; }
}

The result is

document is a AccountType
document is a ProductType
document is a AccountType
document is a ProductType
document is a AccountType
document is a AccountType
document is a ProductType
document is a ProductType

There's a lot going on here so let me explain. The search request JSON looks like:

POST http://localhost:9200/product%2Caccount/producttype%2Caccounttype/_search?pretty=true 
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "Title 1",
                  "operator": "or",
                  "fields": [
                    "title^1.5",
                    "description^0.8"
                  ]
                }
              }
            ],
            "filter": [
              {
                "term": {
                  "_index": {
                    "value": "product"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "multi_match": {
                  "query": "Name 4",
                  "operator": "or",
                  "fields": [
                    "name^3",
                    "description^0.3"
                  ]
                }
              }
            ],
            "filter": [
              {
                "term": {
                  "_index": {
                    "value": "account"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

The search is executed across both the product and account indices, across producttype and accounttype types. A multi_match query is performed on the title and description fields, and this is combined with a term query using a bool query, to constrain the query to the product index. The term query is in a filter clause because no relevancy score should be calculated for the term query. This bool query is combined with another bool query that performs a multi_match query on name and description fields, cmbined with a term query to constrain the query to the account index. The two bool queries are combined using should clauses because either one of the bool queries or the other needs to match.

object is used as the generic parameter type for the Search<T>() method call because ProductType and AccountType do not share a common base class (besides object!) to which the resulting document collection can be typed. We can see however from the results that NEST has actually deserialized documents with type producttype to instances of ProductType and documents with type accounttype to instances of AccountType.

The query uses operator overloading to more succinctly combine queries.

like image 157
Russ Cam Avatar answered May 15 '23 23:05

Russ Cam