I try to make an autocomplete function with angularjs and elasticsearch on a given field, for example <code>countryname</code>. it can contain simple names like "France", "Spain" or "composed names" like "Sierra Leone". In the mapping this field is <code>not_analyzed</code> to prevent elastic to tokenize "composed names" <pre class="prettyprint"><code>"COUNTRYNAME" : {"type" : "string", "store" : "yes","index": "not_analyzed" } </code></pre> I need to query elasticsearch: <ul> <li>to filter the document with something like "countryname:value" where value can contain wildcard</li> <li>and make an aggregation on the countryname returned by the filter, ( i do aggregation to get only distinct data, the count is useless for me her, maybe there is a better solution)</li> </ul> <hr> I can't use wildcard with the "not_analyzed" field : this is my query but wildcard in "value" variable doesn't work and it's case sensitive : The wildcard alone her work : <pre class="prettyprint"><code>curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{ "fields": [ "COUNTRYNAME" ], "query": { "query_string": { "query": "COUNTRYNAME:*" } }, "aggs": { "general": { "terms": { "field": "COUNTRYNAME", "size": 0 } } } }' </code></pre> but this doesn't work (franc*) : <pre class="prettyprint"><code>curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{ "fields": [ "COUNTRYNAME" ], "query": { "query_string": { "query": "COUNTRYNAME:Franc*" } }, "aggs": { "general": { "terms": { "field": "COUNTRYNAME", "size": 0 } } } }' </code></pre> I tried also with <code>bool must query</code> but don't work with this not_analyzed field and wildcard : <pre class="prettyprint"><code>curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{ "fields": [ "COUNTRYNAME" ], "query": { "bool": { "must": [ { "match": { "COUNTRYNAME": "Franc*" } } ] } }, "aggs": { "general": { "terms": { "field": "COUNTRYNAME", "size": 0 } } } }' </code></pre> What I'm missing or doing wrong? should I left the field <code>analyzed</code> in the mapping and use another analyser who don't split composed name into token??

i found a working solution : the "keyword" tokenizer. create a custom analyzer and use it in the mapping for the field i want to keep without split by space : <pre class="prettyprint"><code> curl -XPUT 'localhost:9200/botanic/' -d '{ "settings":{ "index":{ "analysis":{ "analyzer":{ "keylower":{ "tokenizer":"keyword", "filter":"lowercase" } } } } }, "mappings":{ "specimens" : { "_all" : {"enabled" : true}, "_index" : {"enabled" : true}, "_id" : {"index": "not_analyzed", "store" : false}, "properties" : { "_id" : {"type" : "string", "store" : "no","index": "not_analyzed" } , ... "LOCATIONID" : {"type" : "string", "store" : "yes","index": "not_analyzed" } , "AVERAGEALTITUDEROUNDED" : {"type" : "string", "store" : "yes","index": "analyzed" } , "CONTINENT" : {"type" : "string","analyzer":"keylower" } , "COUNTRYNAME" : {"type" : "string","analyzer":"keylower" } , "COUNTRYCODE" : {"type" : "string", "store" : "yes","index": "analyzed" } , "COUNTY" : {"type" : "string","analyzer":"keylower" } , "LOCALITY" : {"type" : "string","analyzer":"keylower" } } } } }' </code></pre> so i can use wildcard in query on the field COUNTRYNAME, who is not splitted : <pre class="prettyprint"><code>curl -XGET 'localhost:9200/botanic/specimens/_search?size=10' -d '{ "fields" : ["COUNTRYNAME"], "query": {"query_string" : { "query": "COUNTRYNAME:bol*" }}, "aggs" : { "general" : { "terms" : { "field" : "COUNTRYNAME", "size":0 } } }}' </code></pre> the result : <pre class="prettyprint"><code>{ "took" : 14, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 45, "max_score" : 1.0, "hits" : [{ "_index" : "botanic", "_type" : "specimens", "_id" : "91E7B53B61DF4E76BF70C780315A5DFD", "_score" : 1.0, "fields" : { "COUNTRYNAME" : ["Bolivia, Plurinational State of"] } }, { "_index" : "botanic", "_type" : "specimens", "_id" : "7D811B5D08FF4F17BA174A3D294B5986", "_score" : 1.0, "fields" : { "COUNTRYNAME" : ["Bolivia, Plurinational State of"] } } ... ] }, "aggregations" : { "general" : { "buckets" : [{ "key" : "bolivia, plurinational state of", "doc_count" : 45 } ] } } } </code></pre>

elasticsearch mapping tokenizer keyword to avoid splitting tokens and enable use of wildcard

Tags:

elasticsearch

I try to make an autocomplete function with angularjs and elasticsearch on a given field, for example countryname. it can contain simple names like "France", "Spain" or "composed names" like "Sierra Leone".

In the mapping this field is not_analyzed to prevent elastic to tokenize "composed names"

"COUNTRYNAME" : {"type" : "string", "store" : "yes","index": "not_analyzed" }

I need to query elasticsearch:

to filter the document with something like "countryname:value" where value can contain wildcard
and make an aggregation on the countryname returned by the filter, ( i do aggregation to get only distinct data, the count is useless for me her, maybe there is a better solution)

I can't use wildcard with the "not_analyzed" field :

this is my query but wildcard in "value" variable doesn't work and it's case sensitive :

The wildcard alone her work :

curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
  "fields": [
    "COUNTRYNAME"
  ],
  "query": {
    "query_string": {
      "query": "COUNTRYNAME:*"
    }
  },
  "aggs": {
    "general": {
      "terms": {
        "field": "COUNTRYNAME",
        "size": 0
      }
    }
  }
}'

but this doesn't work (franc*) :

curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
  "fields": [
    "COUNTRYNAME"
  ],
  "query": {
    "query_string": {
      "query": "COUNTRYNAME:Franc*"
    }
  },
  "aggs": {
    "general": {
      "terms": {
        "field": "COUNTRYNAME",
        "size": 0
      }
    }
  }
}'

I tried also with bool must query but don't work with this not_analyzed field and wildcard :

curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
  "fields": [
    "COUNTRYNAME"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "COUNTRYNAME": "Franc*"
          }
        }
      ]
    }
  },
  "aggs": {
    "general": {
      "terms": {
        "field": "COUNTRYNAME",
        "size": 0
      }
    }
  }
}'

What I'm missing or doing wrong? should I left the field analyzed in the mapping and use another analyser who don't split composed name into token??

249

asked Oct 21 '14 11:10

AlainIb

1 Answers

i found a working solution : the "keyword" tokenizer. create a custom analyzer and use it in the mapping for the field i want to keep without split by space :

    curl -XPUT 'localhost:9200/botanic/' -d '{
 "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "keylower":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
        "specimens" : {
            "_all" : {"enabled" : true},
            "_index" : {"enabled" : true},
            "_id" : {"index": "not_analyzed", "store" : false},
            "properties" : {
                "_id" : {"type" : "string", "store" : "no","index": "not_analyzed"  } ,
            ...
                "LOCATIONID" : {"type" : "string",  "store" : "yes","index": "not_analyzed" } ,
                "AVERAGEALTITUDEROUNDED" : {"type" : "string",  "store" : "yes","index": "analyzed" } ,
                "CONTINENT" : {"type" : "string","analyzer":"keylower" } ,
                "COUNTRYNAME" : {"type" : "string","analyzer":"keylower" } ,                
                "COUNTRYCODE" : {"type" : "string", "store" : "yes","index": "analyzed" } ,
                "COUNTY" : {"type" : "string","analyzer":"keylower" } ,
                "LOCALITY" : {"type" : "string","analyzer":"keylower" }                 
            }
        }
    }
}'

so i can use wildcard in query on the field COUNTRYNAME, who is not splitted :

curl -XGET 'localhost:9200/botanic/specimens/_search?size=10' -d '{
"fields"  : ["COUNTRYNAME"],     
"query": {"query_string" : {
                    "query": "COUNTRYNAME:bol*"
}},
"aggs" : {
    "general" : {
        "terms" : {
            "field" : "COUNTRYNAME", "size":0
        }
    }
}}'

the result :

{
    "took" : 14,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 45,
        "max_score" : 1.0,
        "hits" : [{
                "_index" : "botanic",
                "_type" : "specimens",
                "_id" : "91E7B53B61DF4E76BF70C780315A5DFD",
                "_score" : 1.0,
                "fields" : {
                    "COUNTRYNAME" : ["Bolivia, Plurinational State of"]
                }
            }, {
                "_index" : "botanic",
                "_type" : "specimens",
                "_id" : "7D811B5D08FF4F17BA174A3D294B5986",
                "_score" : 1.0,
                "fields" : {
                    "COUNTRYNAME" : ["Bolivia, Plurinational State of"]
                }
            } ...
        ]
    },
    "aggregations" : {
        "general" : {
            "buckets" : [{
                    "key" : "bolivia, plurinational state of",
                    "doc_count" : 45
                }
            ]
        }
    }
}

196

answered Oct 06 '22 09:10

AlainIb

Related questions
                            
                                search with special characters in elasticsearch
                            
                                Storing HTML Documents in Elasticsearch
                            
                                Why use Elasticsearch or Apache Solr along with Hibernate Search?
                            
                                ElasticSearch RoutingMissingException
                            
                                Reindexing Elastic search via Bulk API, scan and scroll
                            
                                get buckets count in elasticsearch aggregations
                            
                                Data transfer from SQL Server to ElasticSearch Node
                            
                                Can't start ElasticSearch on Mac
                            
                                What is the proper way to unit test Service with NestJS/Elastic
                            
                                How can elasticsearch objects be boosted based on date or score
                            
                                ElasticSearch terms aggregation by entire field
                            
                                What does disable_coord parameter for boolean queries mean?
                            
                                Check if elasticsearch index is open or closed
                            
                                How to parse json in logstash /grok from a text file line?
                            
                                Customize the information in an alert received by elastalert plugin for elasticsearch
                            
                                Range query in ElasticSearch (GET without body)
                            
                                ElasticSearch entered "read only" mode, node cannot be altered
                            
                                Sorting by multiple params in pyes and elasticsearch
                            
                                How to filter last 5 minutes, date histogram using Elastic search?
                            
                                ElasticSearch - sort search results by relevance and custom field (Date)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

elasticsearch mapping tokenizer keyword to avoid splitting tokens and enable use of wildcard

Tags:

elasticsearch

AlainIb

People also ask

1 Answers

AlainIb

Recent Activity

Donate For Us