Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch insensitive search accents

I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.

For example: I have two words. "Camión" and "Camion". When a user search for "camion" I'd like the two results show up.

Creating index:

es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])

es.indices.create(index='name', ignore=400)

es.index(
    index="name",
    doc_type="producto",
    id=p.pk,
    body={
        'title': p.titulo,
        'slug': p.slug,
        'summary': p.summary,
        'description': p.description,
        'image': foto,
        'price': p.price,
        'wholesale_price': p.wholesale_price,
        'reference': p.reference,
        'ean13': p.ean13,
        'rating': p.rating,
        'quantity': p.quantity,
        'discount': p.discount,
        'sales': p.sales,
        'active': p.active,
        'encilleria': p.encilleria,
        'brand': marca,
        'brand_title': marca_titulo,
        'sellos': sellos_str,
        'certificados': certificados_str,
        'attr_naturales': attr_naturales_str,
        'soluciones': soluciones_str,
        'categories': categories_str,
        'delivery': p.delivery,
        'stock': p.stock,
        'consejos': p.consejos,
        'ingredientes': p.ingredientes,
        'es_pack': p.es_pack,
        'temp': p.temp,
        'relevancia': p.relevancia,
        'descontinuado': p.descontinuado,
    }

Search:

    from elasticsearch import Elasticsearch
    es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])

    resul = es.search(
        index="name",
        body={
            "query": {
                "query_string": {
                    "query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
                    "analyze_wildcard": False
                }
            },
            "size": "9999",
        }
    )
    print resul

I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.

like image 751
Marcos Aguayo Avatar asked Feb 06 '23 13:02

Marcos Aguayo


1 Answers

You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.

Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "fields": {
            "folded": {
              "type": "string",
              "analyzer": "folding"
            }
          }
        }
      }
    }
  }
}

And the query:

  "query": {
    "query_string": {
      "query": "\\*.folded:camion"
    }
  }

Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html

like image 139
Andrei Stefan Avatar answered Feb 12 '23 16:02

Andrei Stefan