Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Highlight whole content in Elasticsearch for multivalue fields

Using the highlight feature of Elasticsearch:

"highlight": {
  "fields": {
    "tags": { "number_of_fragments": 0 }
  }
}

With number_of_fragments: 0, no fragments are produced, but the whole content of the field is returned. This is useful for short texts, because documents can be displayed as normal, and people can easily scan for highlighted parts.

How do you use this when a document contains an array with multiple values?

PUT /test/doc/1
{
  "tags": [
    "one hit tag",
    "two foo tag",
    "three hit tag",
    "four foo tag"
  ]
}

GET /test/doc/_search
{
  "query": { 
    "match": { "tags": "hit"} 
  }, 
  "highlight": {
    "fields": {
      "tags": { "number_of_fragments": 0 }
    }
  }
}

Now what I would like to show the user:

1 result:

Document 1, tagged with:

"one hit tag", "two foo tag", "three hit tag", "four foo tag"

Unfortunately, this is the result of the query:

{
     "took": 1,
     "timed_out": false,
     "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
     },
     "hits": {
        "total": 1,
        "max_score": 0.10848885,
        "hits": [
           {
              "_index": "test",
              "_type": "doc",
              "_id": "1",
              "_score": 0.10848885,
              "_source": {
                 "tags": [
                    "one hit tag",
                    "two foo tag",
                    "three hit tag",
                    "four foo tag"
                 ]
              },
              "highlight": {
                 "tags": [
                    "one <em>hit</em> tag",
                    "three <em>hit</em> tag"
                 ]
              }
           }
        ]
     }
  }

How can I use this to get to:

   "tags": [
      "one <em>hit</em> tag",
      "two foo tag",
      "three <em>hit</em> tag",
      "four foo tag"
   ]
like image 265
mlangenberg Avatar asked Aug 29 '14 09:08

mlangenberg


1 Answers

One possibility is to strip the <em> html-tags from the highlighted fields. Then look them up in the original field:

tags = [
   "one hit tag",
   "two foo tag",
   "three hit tag",
   "four foo tag"
]
highlighted = [
  "one <em>hit</em> tag",
  "three <em>hit</em> tag",
] 

highlighted.each do |highlighted_tag|
  if (index = tags.index(highlighted_tag.gsub(/<\/?em>/, '')))
    tags[index] = highlighted_tag
  end
end

puts tags #=> 
# one <em>hit</em> tag
# two foo tag
# three <em>hit</em> tag
# four foo tag

This does not receives a price for the most beautiful code, but I reckon it gets the job done.

like image 132
mlangenberg Avatar answered Oct 22 '22 12:10

mlangenberg