I have an Elasticsearch index where I have some data. I implemented and did-you-mean
feature so when the user write something misspelled it could receive a suggestion with the right words.
I used the phrase suggester because I need suggestions for short phrases, like names for example, the problem is that some suggestions do not exists in the index.
Example:
document in the index: coding like a master
search: Codning like a boss
suggestion: <em>coding</em> like a boss
search result: not found
My problem is that, there are no phrase in my index that match the specified suggestion, so it's recommending me phrases that do not exists and thus will give me a not found search.
What can I do with this? Shouldn't phrase suggester give suggestions for phrases that actually exists in the index?
Here I'll leave the corresponding query, mapping and setting just in case you need it.
Setting and Mappings
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 1,
"search.slowlog.threshold.fetch.warn": "2s",
"index.analysis.analyzer.default.filter.0": "standard",
"index.analysis.analyzer.default.tokenizer": "standard",
"index.analysis.analyzer.default.filter.1": "lowercase",
"index.analysis.analyzer.default.filter.2": "asciifolding",
"index.priority": 3,
"analysis": {
"analyzer": {
"suggests_analyzer": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"asciifolding",
"shingle_filter"
],
"type": "custom"
}
},
"filter": {
"shingle_filter": {
"min_shingle_size": 2,
"max_shingle_size": 3,
"type": "shingle"
}
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"suggest_field": {
"analyzer": "suggests_analyzer",
"type": "string"
}
}
}
}
}
Query
{
"DidYouMean": {
"text": "Codning like a boss",
"phrase": {
"field": "suggest_field",
"size": 1,
"gram_size": 1,
"confidence": 2.0
}
}
}
Thanks for your help.
This is expected actually. If you analyze your document with analyze api, you will get a better picture of what is happening.
GET suggest_index/_analyze?text=coding like a master&analyzer=suggests_analyzer
This is the output
{
"tokens": [
{
"token": "coding",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "coding like",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 1
},
{
"token": "coding like a",
"start_offset": 0,
"end_offset": 13,
"type": "shingle",
"position": 1
},
{
"token": "like",
"start_offset": 7,
"end_offset": 11,
"type": "word",
"position": 2
},
{
"token": "like a",
"start_offset": 7,
"end_offset": 13,
"type": "shingle",
"position": 2
},
{
"token": "like a master",
"start_offset": 7,
"end_offset": 20,
"type": "shingle",
"position": 2
},
{
"token": "a",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 3
},
{
"token": "a master",
"start_offset": 12,
"end_offset": 20,
"type": "shingle",
"position": 3
},
{
"token": "master",
"start_offset": 14,
"end_offset": 20,
"type": "word",
"position": 4
}
]
}
As you can see, there is a token "coding" generated for the text and hence it is in your index. It is not suggesting you something that is not in index.If you strictly want phrase search, then you might want to consider using keyword tokenizer. For e.g if you change your mapping to something like
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"suggests_analyzer": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"asciifolding",
"shingle_filter"
],
"type": "custom"
},
"raw_analyzer": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"filter": {
"shingle_filter": {
"min_shingle_size": 2,
"max_shingle_size": 3,
"type": "shingle"
}
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"suggest_field": {
"analyzer": "suggests_analyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "raw_analyzer",
"type": "string"
}
}
}
}
}
}
}
then this query will give you expected results
{
"DidYouMean": {
"text": "codning lke a master",
"phrase": {
"field": "suggest_field.raw",
"size": 1,
"gram_size": 1
}
}
}
it wont show anything for "codning like a boss".
EDIT 1
2) From your comments and also from running some phrase suggestions on my own dataset, I feel a much better approach would be to use collate
option phrase suggester
provides so that we can check every suggestion against a query
and give back suggestion only if it is going to get back any document from index. I have also added stemmers
to mapping to consider only root word. I am using light_english
as it is less aggressive. More on that.
Analyzer part of mapping looks like this now
"analysis": {
"analyzer": {
"suggests_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_possessive_stemmer",
"light_english_stemmer",
"asciifolding",
"shingle_filter"
],
"type": "custom"
}
},
"filter": {
"light_english_stemmer": {
"type": "stemmer",
"language": "light_english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"shingle_filter": {
"min_shingle_size": 2,
"max_shingle_size": 4,
"type": "shingle"
}
}
}
Now this query will give you desired results.
{
"suggest" : {
"text" : "appel on the tabel",
"simple_phrase" : {
"phrase" : {
"field" : "suggest_field",
"size" : 5,
"collate": {
"query": {
"inline" : {
"match_phrase": {
"{{field_name}}" : "{{suggestion}}"
}
}
},
"params": {"field_name" : "suggest_field"},
"prune": false
}
}
}
},
"size": 0
}
This will give you back apple on the table
Here match_phrase
query is used which will run every suggested phrase against index. You can make "prune" : true
and see all results that have been suggested regardless of the match. You might want to consider using stop
filter to avoid stopwords.
Hope this helps!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With