I try to make an autocomplete function with angularjs and elasticsearch on a given field, for example countryname
. it can contain simple names like "France", "Spain" or "composed names" like "Sierra Leone".
In the mapping this field is not_analyzed
to prevent elastic to tokenize "composed names"
"COUNTRYNAME" : {"type" : "string", "store" : "yes","index": "not_analyzed" }
I need to query elasticsearch:
I can't use wildcard with the "not_analyzed" field :
this is my query but wildcard in "value" variable doesn't work and it's case sensitive :
The wildcard alone her work :
curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
"fields": [
"COUNTRYNAME"
],
"query": {
"query_string": {
"query": "COUNTRYNAME:*"
}
},
"aggs": {
"general": {
"terms": {
"field": "COUNTRYNAME",
"size": 0
}
}
}
}'
but this doesn't work (franc*) :
curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
"fields": [
"COUNTRYNAME"
],
"query": {
"query_string": {
"query": "COUNTRYNAME:Franc*"
}
},
"aggs": {
"general": {
"terms": {
"field": "COUNTRYNAME",
"size": 0
}
}
}
}'
I tried also with bool must query
but don't work with this not_analyzed field and wildcard :
curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
"fields": [
"COUNTRYNAME"
],
"query": {
"bool": {
"must": [
{
"match": {
"COUNTRYNAME": "Franc*"
}
}
]
}
},
"aggs": {
"general": {
"terms": {
"field": "COUNTRYNAME",
"size": 0
}
}
}
}'
What I'm missing or doing wrong? should I left the field analyzed
in the mapping and use another analyser who don't split composed name into token??
The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and outputs the exact same text as a single term. It can be combined with token filters to normalise output, e.g. lower-casing email addresses.
A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!] .
A standard tokenizer is used by Elasticsearch by default, which breaks the words based on grammar and punctuation. In addition to the standard tokenizer, there are a handful of off-the-shelf tokenizers: standard, keyword, N-gram, pattern, whitespace, lowercase and a handful of other tokenizers.
Elasticsearch analyzers and normalizers are used to convert text into tokens that can be searched. Analyzers use a tokenizer to produce one or more tokens per text field. Normalizers use only character filters and token filters to produce a single token.
i found a working solution : the "keyword" tokenizer. create a custom analyzer and use it in the mapping for the field i want to keep without split by space :
curl -XPUT 'localhost:9200/botanic/' -d '{
"settings":{
"index":{
"analysis":{
"analyzer":{
"keylower":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"specimens" : {
"_all" : {"enabled" : true},
"_index" : {"enabled" : true},
"_id" : {"index": "not_analyzed", "store" : false},
"properties" : {
"_id" : {"type" : "string", "store" : "no","index": "not_analyzed" } ,
...
"LOCATIONID" : {"type" : "string", "store" : "yes","index": "not_analyzed" } ,
"AVERAGEALTITUDEROUNDED" : {"type" : "string", "store" : "yes","index": "analyzed" } ,
"CONTINENT" : {"type" : "string","analyzer":"keylower" } ,
"COUNTRYNAME" : {"type" : "string","analyzer":"keylower" } ,
"COUNTRYCODE" : {"type" : "string", "store" : "yes","index": "analyzed" } ,
"COUNTY" : {"type" : "string","analyzer":"keylower" } ,
"LOCALITY" : {"type" : "string","analyzer":"keylower" }
}
}
}
}'
so i can use wildcard in query on the field COUNTRYNAME, who is not splitted :
curl -XGET 'localhost:9200/botanic/specimens/_search?size=10' -d '{
"fields" : ["COUNTRYNAME"],
"query": {"query_string" : {
"query": "COUNTRYNAME:bol*"
}},
"aggs" : {
"general" : {
"terms" : {
"field" : "COUNTRYNAME", "size":0
}
}
}}'
the result :
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 45,
"max_score" : 1.0,
"hits" : [{
"_index" : "botanic",
"_type" : "specimens",
"_id" : "91E7B53B61DF4E76BF70C780315A5DFD",
"_score" : 1.0,
"fields" : {
"COUNTRYNAME" : ["Bolivia, Plurinational State of"]
}
}, {
"_index" : "botanic",
"_type" : "specimens",
"_id" : "7D811B5D08FF4F17BA174A3D294B5986",
"_score" : 1.0,
"fields" : {
"COUNTRYNAME" : ["Bolivia, Plurinational State of"]
}
} ...
]
},
"aggregations" : {
"general" : {
"buckets" : [{
"key" : "bolivia, plurinational state of",
"doc_count" : 45
}
]
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With