I'm configuring a tokenizer that splits words by underscore char as well as by all other punctuation chars. I decided to use word_delimiter filter for this. Then I set my analyzer as a default for desired field.
I have two issues with it:
Here is my template, data object, analyzer test and search requests:
PUT simple
{
"template" : "simple",
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"underscore_splits_words" : {
"tokenizer" : "standard",
"filter" : ["word_delimiter"],
"generate_word_parts" : true,
"preserve_original" : true
}
}
}
},
"mappings": {
"_default_": {
"properties" : {
"request" : { "type" : "string", "analyzer" : "underscore_splits_words" }
}
}
}
}
}
Data object:
POST simple/0
{ "request" : "GET /queue/1/under_score-hyphenword/poll?ttl=300&limit=10" }
This returns tokens: "under", "score", "hyphenword", but no "underscore_splits_words":
POST simple/_analyze?analyzer=underscore_splits_words
{"/queue/1/under_score-hyphenword/poll?ttl=300&limit=10"}
Search results
Hit:
GET simple/_search?q=hyphenword
Hit:
POST simple/_search
{
"query": {
"query_string": {
"query": "hyphenword"
}
}
}
Miss:
GET simple/_search?q=score
Miss:
POST simple/_search
{
"query": {
"query_string": {
"query": "score"
}
}
}
Please suggest a correct way to achieve my goal. Thanks!
You should be able to use the "simple" analyzer for this to work. There's no need for a custom analyzer, because the simple analyzer uses the letter tokenizer and the lowercase tokenizer in conjunction (thus, any non-alphabetical characters signal a new token). The reason you are not getting any hits is because you are not specifying the field in your query, so you are querying the _all field, which is mainly for convenient fulltext searching.
PUT myindex
{
"mappings": {
"mytype": {
"properties": {
"request": {
"type": "string",
"analyzer": "simple"
}
}
}
}
}
POST myindex/mytype/1
{ "request" : "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10" }
GET myindex/mytype/_search?q=request:key
POST myindex/mytype/_search
{
"query": {
"query_string": {
"default_field": "request",
"query": "key"
}
}
}
POST myindex/mytype/_search
{
"query": {
"bool": {
"must": [
{ "match": { "request": "key"}}
]
}
}
}
The output from the queries looks correct:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "myindex",
"_type": "mytype",
"_id": "1",
"_score": 0.095891505,
"_source": {
"request": "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10"
}
}
]
}
}
If you want to be omit the specific field you're searching (NOT RECOMMENDED), you can set the default analyzer for the all mappings in the index when you create the index. (Note, this feature is deprecated, and you shouldn't use it for performance/stability reasons.)
PUT myindex
{
"mappings": {
"_default_": {
"index_analyzer": "simple"
}
}
}
POST myindex/mytype/1
{ "request" : "GET /queue/1/key_word-hyphenword/poll?ttl=300&limit=10" }
GET myindex/mytype/_search?q=key
You will get the same result (1 hit).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With