Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch multi-match cross fields query with different query analyzers

USE CASE: I have a collection of companies. Each company has information of city and country. I want to be able to make text searches to find for example companies in Bangkok - Thailand. All the information must be searchable in different languages. Example: In Brazil most people refer to Bangkok in english version, and not Banguecoque as the brazilian one. In this case if a person wants to search for companies in Bangkok - Thailand, the search sentence will be bangkok tailandia. Because of this requirement I must be able to search across different language fields to retrieve the results.

PROBLEM: When sending queries without specifying the analyzer Elasticsearch use the search_analyzer specified on each field configuration. The problem is that it breaks the purpose of cross fields query. This is the analyzers configuration:

"query_analyzer_en": {
    "type": "custom",
    "tokenizer": "standard",
    "filter": [ "lowercase", "asciifolding", "stopwords_en" ]
},
"query_analyzer_pt": {
    "type": "custom",
    "tokenizer": "standard",
    "filter": [ "lowercase", "asciifolding", "stopwords_pt" ]
}

Each analyzer usess a different stop filter by language.

This is the fields configuration:

"dynamic_templates": [{
    "english": {
        "match": "*_txt_en",
        "match_mapping_type": "string",
        "mapping": {
            "type": "string",
            "analyzer": "index_analyzer_en",
            "search_analyzer": "query_analyzer_en"
        }
    }
}, {
    "portuguese": {
        "match": "*_txt_pt",
        "match_mapping_type": "string",
        "mapping": {
            "type": "string",
            "analyzer": "index_analyzer_pt",
            "search_analyzer": "query_analyzer_pt"
        }
    }
}]

This is the query I'm using:

{
   "query": {
      "multi_match" : {
        "query" : "bangkok tailandia",
        "type"  : "cross_fields",
        "operator":   "and",
        "fields" : [ "city_txt_en", "country_txt_pt" ],
        "tie_breaker": 0.0
      }
   },
   "profile": true
}

After profiling the query the result is:

(+city_txt_en:bangkok +city_txt_en:tailandia) 
(+country_txt_pt:bangkok +country_txt_pt:tailandia)

It's not working properly because Elasticsearch is trying to match both terms in city and country fields. The problem is that the term bangkok is in english and the term tailandia is in portuguese.

If I set a analyzer on the query the lucene query is the way I expect:

+(city_txt_en:bangkok | country_txt_pt:bangkok) 
+(city_txt_en:tailandia | country_txt_pt:tailandia)

But now the problem is that I must use the same query analyzer to both languages. I need a way to generate the lucene query above using different query analyzers by language.

like image 742
Bruno dos Santos Avatar asked Mar 17 '16 04:03

Bruno dos Santos


Video Answer


2 Answers

You should be able to implement this using [query_string][1]. Query string breaks the terms and then applies them across each field as per the analyzer. Example:

{
   "query": {
      "query_string" : {
        "query" : "bangkok tailandia",
        "default_operator":   "AND",
        "fields" : [ "city_txt_en", "country_txt_pt" ]

      }
   },
   "profile": true
}
like image 174
keety Avatar answered Oct 14 '22 21:10

keety


According to the docs, cross_fields mandates that all fields have the same analyzer

What you could do, however, is to split your query in two parts like this where each part has an equal chance of matching. Here you could use a match since each multi_match has a single field, but you can also add other fields having the same analyzer in each sub-query

{
    "bool": {
        "should": [
            {
              "multi_match" : {
                "query" : "bangkok tailandia",
                "type":       "cross_fields",
                "operator":   "and",
                "fields" : [ "city_txt_en" ],
                "minimum_should_match": "50%" 
              }
            },
            {
              "multi_match" : {
                "query" : "bangkok tailandia",
                "type":       "cross_fields",
                "operator":   "and",
                "fields" : [ "country_txt_pt" ]
              }
            }
        ]
    }
}
like image 28
Val Avatar answered Oct 14 '22 19:10

Val