I have been trying to match a query using the elasticsearch python client but I am unable to match it even after using escape characters and setting up some custom analyzers and mapping them. I want to search using &
and its not giving any response.
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
doc1 = {
'name': 'numb',
'band': 'linkin_park',
'year': '2006'
}
doc2 = {
'name': 'Powerless &',
'band': 'linkin_park',
'year': '2006'
}
doc3 = {
'name': 'Crawling !',
'band': 'linkin_park',
'year': '2006'
}
doc =[doc1, doc2, doc3]
'''
create_index = {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
es.indices.create(index="idx_temp", body=create_index)
'''
for i in range(3):
es.index(index="idx_temp", doc_type='_doc', id=i, body=doc[i])
my_mapping = {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
'ignore_above': 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
},
"band": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
},
"year": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
}
}
}
es.indices.put_mapping(index='idx_temp', body=my_mapping, doc_type='_doc', include_type_name=True)
res = es.search(index='idx_temp', body={
"query": {
"match": {
"name": {
"query": "powerless &",
"fuzziness": 3
}
}
}
})
for hit in res['hits']['hits']:
print(hit['_source'])
The expected output was 'name': 'Poweeerless &',
but i got 0 hits and no value returned.
You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .
There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.
In Elasticsearch, an index (plural: indices) contains a schema and can have one or more shards and replicas. An Elasticsearch index is divided into shards and each shard is an instance of a Lucene index. Indices are used to store the documents in dedicated data structures corresponding to the data type of fields.
So I have fixed the problem by adding another field
"search_quote_analyzer": "my_analyzer"
to the settings field after
"analyzer": "my_analyzer"
"search_analyzer": "my_analyzer"
And then I'm getting my output by searching with &
in the query as
'name': 'Poweeerless &'
I just tried it using your index settings, mapping, and query and was able to get the results. Below are 2 different things which I did.
&
, when I was trying to index the doc using ES REST API directly, using below the body in postman:{ "content": "Powerless \&" }
Then ES gave me the Unrecognized character escape '&'
exception and even Postman, popular REST client was also giving me warning about not a proper string.
Then I changed above payload to below and was able to index the doc:
{
"content": "Powerless \\&" :-> Notice I added a another `\` to escape the `&`
}
&
, in your case it is name
field, not the content
field., As match query is analyzed and uses the same analyzer which is used for indexing time. And was able to get the result.PS: I also verified your analyzer using _analyze api and it's generating the below tokens for text Powerless \\&
{
"tokens": [
{
"token": "powerless",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "\\&",
"start_offset": 10,
"end_offset": 12,
"type": "word",
"position": 1
}
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With