Getting distinct values using NEST ElasticSearch client

Tags:

I'm building a product search engine with Elastic Search in my .NET application, by using the NEST client, and there is one thing i'm having trouble with. Getting a distinct set of values.

I'm search for products, which there are many thousands, but of course i can only return 10 or 20 at a time to the user. And for this paging works fine. But besides this primary result, i want to show my users a list of brands that are found within the complete search, to present these for filtering.

I have read about that i should use Terms Aggregations for this. But, i couldn't get anything better than this. And this still doesn't really give me what i want, because it splits values like "20th Century Fox" into 3 separate values.

    var brandResults = client.Search<Product>(s => s
         .Query(query)
         .Aggregations(a => a.Terms("my_terms_agg", t => t.Field(p => p.BrandName).Size(250))
         )
     );

    var agg = brandResult.Aggs.Terms("my_terms_agg");

Is this even the right approach? Or should is use something totally different? And, how can i get the correct, complete values? (Not split by space .. but i guess that is what you get when you ask for a list of 'Terms'??)

What i'm looking for is what you would get if you would do this in MS SQL

SELECT DISTINCT BrandName FROM [Table To Search] WHERE [Where clause without paging]

609

asked Feb 23 '15 15:02

Bart

2 Answers

You are correct that what you want is a terms aggregation. The problem you're running into is that ES is splitting the field "BrandName" in the results it is returning. This is the expected default behavior of a field in ES.

What I recommend is that you change BrandName into a "Multifield", this will allow you to search on all the various parts, as well as doing a terms aggregation on the "Not Analyzed" (aka full "20th Century Fox") term.

Here is the documentation from ES.

https://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html

[UPDATE] If you are using ES version 1.4 or newer the syntax for multi-fields is a little different now.

https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields

Here is a full working sample the illustrate the point in ES 1.4.4. Note the mapping specifies a "not_analyzed" version of the field.

PUT hilden1

PUT hilden1/type1/_mapping
{
  "properties": {
    "brandName": {
      "type": "string",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

POST hilden1/type1
{
  "brandName": "foo"
}

POST hilden1/type1
{
  "brandName": "bar"
}

POST hilden1/type1
{
  "brandName": "20th Century Fox"
}

POST hilden1/type1
{
  "brandName": "20th Century Fox"
}

POST hilden1/type1
{
  "brandName": "foo bar"
}

GET hilden1/type1/_search
{
  "size": 0, 
  "aggs": {
    "analyzed_field": {
      "terms": {
        "field": "brandName",
        "size": 10
      }
    },
    "non_analyzed_field": {
      "terms": {
        "field": "brandName.raw",
        "size": 10
      }
    }    
  }
}

Results of the last query:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "non_analyzed_field": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "20th Century Fox",
               "doc_count": 2
            },
            {
               "key": "bar",
               "doc_count": 1
            },
            {
               "key": "foo",
               "doc_count": 1
            },
            {
               "key": "foo bar",
               "doc_count": 1
            }
         ]
      },
      "analyzed_field": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "20th",
               "doc_count": 2
            },
            {
               "key": "bar",
               "doc_count": 2
            },
            {
               "key": "century",
               "doc_count": 2
            },
            {
               "key": "foo",
               "doc_count": 2
            },
            {
               "key": "fox",
               "doc_count": 2
            }
         ]
      }
   }
}

Notice that not-analyzed fields keep "20th century fox" and "foo bar" together where as the analyzed field breaks them up.

129

answered Sep 28 '22 01:09

jhilden

I had a similar issue. I was displaying search results and wanted to show counts on the category and sub category.

You're right to use aggregations. I also had the issue with the strings being tokenised (i.e. 20th century fox being split) - this happens because the fields are analysed. For me, I added the following mappings (i.e. tell ES not to analyse that field):

  "category": {
          "type": "nested",
          "properties": {
            "CategoryNameAndSlug": {
              "type": "string",
              "index": "not_analyzed"
            },
            "SubCategoryNameAndSlug": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }

As jhilden suggested, if you use this field for more than one reason (e.g. search and aggregation) you can set it up as a multifield. So on one hand it can get analysed and used for searching and on the other hand for not being analysed for aggregation.

answered Sep 28 '22 01:09

Ali

Related questions
                            
                                How to determine size of an object, c#?
                            
                                Is there a simple way to turn off including multiple language packs when using Nuget?
                            
                                Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ
                            
                                How to Represent Conjugation Tables in C#
                            
                                Service based database vs SQL Server Compact vs LocalDB?
                            
                                How to scale out signalr to a large number of users
                            
                                Timeout for Action in Parallel.ForEach iteration
                            
                                Getting a working SpatiaLite + SQLite system for x64 c#
                            
                                Configuring Microsoft Application Insights to monitor a windows service
                            
                                System.Threading.Timer vs System.Threading.Thread.Sleep resolution - .NET Timer not using system clock resolution
                            
                                Connection drop in HttpListener in C# Mono
                            
                                Single Web API controller per resource or less controllers with more custom actions?
                            
                                UTC to local time conversion for previously saved datetimes if rules for timezone change
                            
                                Generate JSON Schema for ASP.Net Web API
                            
                                Custom Boolean Parameter Binding
                            
                                MVC 5: Custom AuthorizeAttribute and Caching
                            
                                Not showing items with Visibility=Collapsed in Windows 8.1 GridView
                            
                                Static Variable Null In Method Call, But Initialized In Program
                            
                                Code First Migration Seed Error: The binary operator Equal is not defined for the types 'System.Nullable`1[System.Int32]' and 'System.Int32'
                            
                                Can a walker be stopped?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting distinct values using NEST ElasticSearch client

Tags:

c#

.net

elasticsearch

nest

Bart

People also ask

2 Answers

jhilden

Ali

Recent Activity

Donate For Us