I have the following data to be indexed on ElasticSearch.
I want to implement an autocomplete feature, and highlight why a specific document matched a query.
This are the settings of my index:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"autocomplete_filter"
]
}
}
}
}
}
Index Analyzing
So the Inverted Index looks like:
This is how i defined the mappings for a name field:
{
"index_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
When I query:
GET http://localhost:9200/index/type/_search
{
"query": {
"match": {
"name": "soft"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}
Search for: soft
Applying the Standard Tokenizer, the "soft" is the term, to find on the inverted index. This search matches the Documents: 1, 3, 4, 5, 6, 7 which is correct, but the highlighted part I would expect to be "soft" and not the whole word:
{
"hits": [
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
},
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> AG"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> AG2"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> AG good <em>software</em> better"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> AG"
]
}
},
{
"_source": {
"name": "is soft ware ok"
},
"highlight": {
"name": [
"is <em>soft</em> ware ok"
]
}
}
]
}
Search for: software ag
Applying the Standard Tokenizer, the "software ag" is transformed into "software" and "ag", to find on the inverted index. This search matches the Documents: 1, 3, 4, 5, 6, which is correct, but the highlighted part I would expect to be "software" and "ag" and not the whole word around "software" and "ag":
{
"hits": [
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG2</em>"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em> good <em>software</em> better"
]
}
},
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
}
]
}
I read the highlight documentation on elasticsearch, but I cannot understand how the highlighting is performed. For the two examples above I expect only the matched token on the inverted index to be highlighted and not the whole word. Can anyone help how to highlight only the passed value?
Update
So, in seems that on ElasticSearch website, the autocomplete on the server side is similar to my implementation. However it seems that they highlight the matched query on the client. If they do like this, I started to think that there is not a proper solution to do it on ElasticSearch side, so I implemented the highlight feature on server side instead of on client side(as they seem to do).
My implementation on server side(using PHP) is:
public function search($term)
{
$params = [
'index' => $this->getIndexName(),
'type' => $this->getIndexType(),
'body' => [
'query' => [
'match' => [
'name' => $term
]
]
]
];
$results = $this->client->search($params);
$hits = $results['hits']['hits'];
$data = [];
$wrapBefore = '<strong>';
$wrapAfter = '</strong>';
foreach ($hits as $hit) {
$data[] = [
$hit['_source']['id'],
$hit['_source']['name'],
preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
];
}
return $data;
}
Outputs what I aimed with this question:
I added a bounty to see if there is a solution at ElasticSearch level to achive what I described above.
As of now with latest version of elastic this is not possible as highligh documentation don't refer any settings or query for this. I checked elastic autocomplete example in browser console under xhr requests tab and found the response for "att" autocomplete response for keyword as follows.
url - https://search.elastic.co/suggest?q=att
{
"current_page": 1,
"last_page": 4,
"total_hits": 49,
"hits": [
{
"tags": [],
"url": "/elasticon/tour/2016/jp/not-attending",
"section": "Elasticon",
"title": "Not <em>Attending</em> - JP"
},
{
"section": "Elasticon",
"title": "<em>Attending</em> from Training - JP",
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-training"
},
{
"tags": [],
"url": "/elasticon/tour/2016/jp/attending-keynote",
"title": "<em>Attending</em> from Keynote - JP",
"section": "Elasticon"
},
{
"tags": [],
"url": "/elasticon/tour/2016/not-attending",
"section": "Elasticon",
"title": "Thank You - Not <em>Attending</em>"
},
{
"tags": [],
"url": "/elasticon/tour/2016/attending",
"section": "Elasticon",
"title": "Thank You - <em>Attending</em>"
},
{
"section": "Blog",
"title": "What It's Like to <em>Attend</em> Elastic Training",
"tags": [],
"url": "/blog/what-its-like-to-attend-elastic-training"
},
{
"tags": "Elasticsearch",
"url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
"section": "Docs/",
"title": "Highlighting <em>attachments</em>"
},
{
"title": "<em>attachments</em> » email",
"section": "Docs/",
"tags": "Logstash",
"url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
},
{
"section": "Docs/",
"title": "Configuring Email <em>Attachments</em> » Actions",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
},
{
"url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
"tags": "Watcher",
"title": "HipChat Action <em>Attributes</em> » Actions",
"section": "Docs/"
},
{
"title": "Slack Action <em>Attributes</em> » Actions",
"section": "Docs/",
"tags": "Watcher",
"url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
}
],
"aggs": {
"sections": [
{
"Elasticon": 5
},
{
"Blog": 1
},
{
"Docs/": 43
}
],
"top_tags": [
{
"XPack": 14
},
{
"Elasticsearch": 12
},
{
"Watcher": 9
},
{
"Logstash": 4
},
{
"Clients": 3
},
{
"Shield": 1
}
]
}
}
But on frontend they are showing "att" only highlighted on in the autosuggest results. Hence they are handling the highlight stuff on browser layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With