Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch wildcard, regexp, match_phrase, prefix query returning wrong results

I have just started using Elasticsearch, version 7.5.1.

I want to query results which start with a particular word fragment. For example tho* should return data containing:

thought, Thomson, those, etc.

I tried with -

  1. Regexp
[{'regexp':{'f1':'tho.*'}},{'regexp':{'f2':'tho.*'}}]
  1. Wildcard
[{'wildcard':{'f1':'tho*'}},{'wildcard':{'f2':'tho*'}}]
  1. Prefix
[{'prefix':{'f1':'tho'}},{'prefix':{'f2':'tho'}}]
  1. match_phrase
'multi_match': {'query': 'tho', 'fields':[f1,f2,f3], 'type':phrase}
# also tried with type phrase_prefix

All those are returning correct results, but they all also return the word method.

Similarly cat* is returning the word communication.

What I am doing wrong? Is this something related to analyzer?

  • Edit - Here is the field mapping -
'f1': {
                'full_name': 'f1',
                'mapping': {
                    'f1': {
                        'type': 'text',
                        'analyzer': 'some_analyzer',
                        'index_phrases': true
                    }
                }
            },
like image 235
Aditya Avatar asked Nov 06 '22 01:11

Aditya


1 Answers

Since you have not provided any index mapping of yours and as mentioned you are getting method also in the search result. I think that there is some issue with the analyzer that you have set.

One possibility is that you have set ngram tokenizer, that tokenizes the words, and produce token of tho (since all the words have tho present in them)

Adding a working example with index data, mapping, search query, and search result

Index Mapping:

{
  "mappings": {
    "properties": {
      "f1": {
        "type": "text"
      }
    }
  }
}

Index Data:

{
  "f1": "method"
}
{
  "f1": "thought"
}
{
  "f1": "Thomson"
}
{
  "f1": "those"
}

Search Query using Wildcard Query:

{
  "query": {
    "wildcard": {
      "f1": {
        "value": "tho*"
      }
    }
  }
}

Search Query using Prefix Query:

{
  "query": {
    "prefix": {
      "f1": {
        "value": "tho"
      }
    }
  }
}

Search Query using Regexp query:

{
  "query": {
    "regexp": {
      "f1": {
        "value": "tho.*"
      }
    }
  }
}

Search QUery using match phrase prefix query:

{
  "query": {
    "match_phrase_prefix": {
      "f1": {
        "query": "tho"
      }
    }
  }
}

Search Result for all the above 4 queries are

"hits": [
      {
        "_index": "67673694",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.2039728,
        "_source": {
          "f1": "thought"
        }
      },
      {
        "_index": "67673694",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.2039728,
        "_source": {
          "f1": "Thomson"
        }
      },
      {
        "_index": "67673694",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.2039728,
        "_source": {
          "f1": "those"
        }
      }
    ]
like image 154
ESCoder Avatar answered Nov 15 '22 05:11

ESCoder