Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does analyzer prevent fields from highlighting?

could you help me with little problem regarding language-specific analyzers and highliting in elasticsearch?

I need search documents by a query string and highlight matched strings. here is my mapping:

{
    "usr": {
        "properties": {
            "text0": {
                "type": "string",
                "analyzer": "english"
            },
            "text1": {
                "type": "string"
            }
        }
    }
}

Note, that for "text0" field "english" analyzer is set, and for "text1" field is used standard analyzer by default.

In my index there is one document for now:

hits": [{
    "_index": "tt",
    "_type": "usr",
    "_id": "AUxvIPAv84ayQMZV-3Ll",
    "_score": 1,
    "_source": {
        "text0": "highlighted. need to be highlighted.",
        "text1": "highlighted. need to be highlighted."
    }
}]

Consider following query:

{
    "query": {
        "query_string" : {
            "query" : "highlighted"
        }
    },
    "highlight" : {
        "fields" : {
            "*" : {}
        }
    }
}

I've expected each field in the document to be highlighted, but highlighting appeared only in "text1" field (where is no analyzer set):

"hits": [{
    "_type": "usr", 
    "_source": {
        "text0": "highlighted. need to be highlighted.", 
        "text1": "highlighted. need to be highlighted."
    }, 
    "_score": 0.19178301, 
    "_index": "tt", 
    "highlight": {
        "text1": [
            "<em>highlighted</em>. need to be <em>highlighted</em>."
        ]
    }, 
    "_id": "AUxvIPAv84ayQMZV-3Ll"
}]

Let's consider the following query(I expected "highlighted" matches "highlight" because of analyzer):

{
    "query": {
        "query_string" : {
                "query" : "highlight"
            }
        },
    "highlight" : {
             "fields" : {
                 "*" : {}
             }
        }
}

But there was no hist in response at all: (Did the english analyzer even work here?)

"hits": {
    "hits": [], 
    "total": 0, 
    "max_score": null
}

At last, consider some curl commands (requests and responses):

curl "http://localhost:9200/tt/_analyze?field=text0" -d "highlighted"

{"tokens":[{ 
    "token":"highlight",
    "start_offset":0,
    "end_offset":11,
    "type":"<ALPHANUM>",
    "position":1
}]}

curl "http://localhost:9200/tt/_analyze?field=text1" -d "highlighted" 

{"tokens":[{
    "token":"highlighted",
    "start_offset":0,
    "end_offset":11,
    "type":"<ALPHANUM>",
    "position":1
}]}

We see, by passing text through the english and standard analyzers, the result is different. Finally, the question: does analyzer prevent fields from highlighting? How can I get my fields highlighted while full-text search?

P.S. I use elasticsearch v1.4.4 on my local machine with windows 8.1.

like image 305
Viacheslav Shalamov Avatar asked Oct 19 '22 15:10

Viacheslav Shalamov


1 Answers

It has to do with your query. You are using the query_string query and you are not specifying the field so it is searching on the _all field by default. That is why you're seeing the strange results. Change your query to a multi_match query that searches on both fields:

{
    "query": {
        "multi_match": {
            "fields": [
                "text1",
                "text0"
            ],
            "query": "highlighted"
        }
    },
    "highlight": {
        "fields": {
            "*": {}
        }
    }
} 

Now highlight results for both fields will returned in the response.

like image 90
Dan Tuffery Avatar answered Oct 22 '22 10:10

Dan Tuffery