I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.
For example: I have two words. "Camión" and "Camion". When a user search for "camion" I'd like the two results show up.
Creating index:
es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])
es.indices.create(index='name', ignore=400)
es.index(
index="name",
doc_type="producto",
id=p.pk,
body={
'title': p.titulo,
'slug': p.slug,
'summary': p.summary,
'description': p.description,
'image': foto,
'price': p.price,
'wholesale_price': p.wholesale_price,
'reference': p.reference,
'ean13': p.ean13,
'rating': p.rating,
'quantity': p.quantity,
'discount': p.discount,
'sales': p.sales,
'active': p.active,
'encilleria': p.encilleria,
'brand': marca,
'brand_title': marca_titulo,
'sellos': sellos_str,
'certificados': certificados_str,
'attr_naturales': attr_naturales_str,
'soluciones': soluciones_str,
'categories': categories_str,
'delivery': p.delivery,
'stock': p.stock,
'consejos': p.consejos,
'ingredientes': p.ingredientes,
'es_pack': p.es_pack,
'temp': p.temp,
'relevancia': p.relevancia,
'descontinuado': p.descontinuado,
}
Search:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])
resul = es.search(
index="name",
body={
"query": {
"query_string": {
"query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
"analyze_wildcard": False
}
},
"size": "9999",
}
)
print resul
I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.
You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.
Basically, you need something like the following below. The field called text
is just an example. You need to apply the same settings for other fields as well. Note that I used fields
in there so that the root field will maintain the original text analyzed by default, while text.folded
will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion
will match, but also camión
).
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
And the query:
"query": {
"query_string": {
"query": "\\*.folded:camion"
}
}
Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With