Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch count terms ignoring spaces

Using ES 1.2.1

My aggregation

{
    "size": 0,
    "aggs": {
        "cities": {
            "terms": {
                "field": "city","size": 300000
            }
     }
 }

}

The issue is that some city names have spaces in them and aggregate separately.

For instance Los Angeles

{
    "key": "Los",
    "doc_count": 2230
},
{
    "key": "Angeles",
    "doc_count": 2230
},

I assume it has to do with the analyzer? Which one would I use to not split on spaces?

like image 415
user432024 Avatar asked Jun 12 '14 16:06

user432024


People also ask

What is Sum_other_doc_count?

sum_other_doc_count is the number of documents that didn't make it into the the top size terms.

What is Term aggregation in Elasticsearch?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is Bucket aggregation in Kibana?

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context "falls" into it.


1 Answers

For fields that you want to perform aggregations on I would recommend either the keyword analyzer or do not analyze the field at all. From the keyword analyzer documentation:

An analyzer of type keyword that "tokenizes" an entire stream as a single token. This is useful for data like zip codes, ids and so on. Note, when using mapping definitions, it might make more sense to simply mark the field as not_analyzed.

However if you want to still perform analysis on the field to include for other searches, then consider using the field setting of ES 1.x As described in the field/multi_field documentation. This will allow you to have a value of the field for searching and one for aggregations.

like image 187
Paige Cook Avatar answered Oct 14 '22 14:10

Paige Cook