Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to search a query with symbols in elasticsearch

I have been trying to match a query using the elasticsearch python client but I am unable to match it even after using escape characters and setting up some custom analyzers and mapping them. I want to search using & and its not giving any response.

from elasticsearch import Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])


doc1 = {
    'name': 'numb',
    'band': 'linkin_park',
    'year': '2006'
}

doc2 = {
    'name': 'Powerless &',
    'band': 'linkin_park',
    'year': '2006'
}
doc3 = {
    'name': 'Crawling !',
    'band': 'linkin_park',
    'year': '2006'
    }

doc =[doc1, doc2, doc3]
'''
create_index = {
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "type": "custom",
                    "filter": [
                        "lowercase"
                    ],
                    "tokenizer": "whitespace"
                }
            }
        }
    }
}

es.indices.create(index="idx_temp", body=create_index)
'''
for i in range(3):
    es.index(index="idx_temp", doc_type='_doc', id=i, body=doc[i])


my_mapping = {
  "properties": {
      "name": {
          "type": "text",
          "fields": {
              "keyword": {
                  "type": "keyword",
                  'ignore_above': 256
              }
          },
          "analyzer": "my_analyzer"
          "search_analyzer": "my_analyzer"
      },
      "band": {
          "type": "text",
          "fields": {
              "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
              }
          },
          "analyzer": "my_analyzer"
          "search_analyzer": "my_analyzer"
      },
      "year": {
          "type": "text",
          "fields": {
              "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
              }
          },
          "analyzer": "my_analyzer"
          "search_analyzer": "my_analyzer"
      }
  }
}

es.indices.put_mapping(index='idx_temp', body=my_mapping, doc_type='_doc', include_type_name=True)

res = es.search(index='idx_temp', body={
    "query": {
        "match": {
            "name": {
                "query": "powerless &",
                "fuzziness": 3

            }
        }
    }
})

for hit in res['hits']['hits']:
    print(hit['_source'])

The expected output was 'name': 'Poweeerless &', but i got 0 hits and no value returned.

like image 440
Yaboku Avatar asked Jul 04 '19 10:07

Yaboku


People also ask

How do I search for a query in Elasticsearch?

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .

How do I search a specific field in Elasticsearch?

There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.

What is indexing Elasticsearch?

In Elasticsearch, an index (plural: indices) contains a schema and can have one or more shards and replicas. An Elasticsearch index is divided into shards and each shard is an instance of a Lucene index. Indices are used to store the documents in dedicated data structures corresponding to the data type of fields.


2 Answers

So I have fixed the problem by adding another field

 "search_quote_analyzer": "my_analyzer"

to the settings field after

"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"

And then I'm getting my output by searching with & in the query as

'name': 'Poweeerless &'
like image 62
Yaboku Avatar answered Sep 27 '22 16:09

Yaboku


I just tried it using your index settings, mapping, and query and was able to get the results. Below are 2 different things which I did.

  1. Escape the special char &, when I was trying to index the doc using ES REST API directly, using below the body in postman:

{ "content": "Powerless \&" }

Then ES gave me the Unrecognized character escape '&' exception and even Postman, popular REST client was also giving me warning about not a proper string.

Then I changed above payload to below and was able to index the doc:

{
    "content": "Powerless \\&" :-> Notice I added a another `\` to escape the `&`
}
  1. I changed the query to use the same field, which was having the value &, in your case it is name field, not the content field., As match query is analyzed and uses the same analyzer which is used for indexing time. And was able to get the result.

PS: I also verified your analyzer using _analyze api and it's generating the below tokens for text Powerless \\&

{
    "tokens": [
        {
            "token": "powerless",
            "start_offset": 0,
            "end_offset": 9,
            "type": "word",
            "position": 0
        },
        {
            "token": "\\&",
            "start_offset": 10,
            "end_offset": 12,
            "type": "word",
            "position": 1
        }
    ]
}
like image 36
Amit Avatar answered Sep 27 '22 15:09

Amit