Elasticsearch search for Turkish characters

Question

I have some documents that i am indexing with elasticsearch. But some of the documents are written with upper case and Tukish characters are changed. For example "kürşat" is written as "KURSAT".

I want to find this document by searching "kürşat". How can i do that?

Thanks

Byron Voorbach · Accepted Answer

Take a look at the asciifolding token filter.

Here is a small example for you to try out in Sense:

Index:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        }
      },
      "analyzer": {
        "turkish_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_ascii_folding"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "turkish_analyzer"
        }
      }
    }
  }
}

POST test/test/1
{
  "name": "kürşat"
}

POST test/test/2
{
  "name": "KURSAT"
}

Query:

GET test/_search
{
  "query": {
    "match": {
      "name": "kursat"
    }
  }
}

Response:

 "hits": {
    "total": 2,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 0.30685282,
        "_source": {
          "name": "KURSAT"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "name": "kürşat"
        }
      }
    ]
  }

Query:

GET test/_search
{
  "query": {
    "match": {
      "name": "kürşat"
    }
  }
}

Response:

 "hits": {
    "total": 2,
    "max_score": 0.4339554,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.4339554,
        "_source": {
          "name": "kürşat"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 0.09001608,
        "_source": {
          "name": "KURSAT"
        }
      }
    ]
  }

Now the 'preserve_original' flag will make sure that if a user types: 'kürşat', documents with that exact match will be ranked higher than documents that have 'kursat' (Notice the difference in scores for both query responses).

If you want the score to be equal, you can put the flag on false.

Hope I got your problem right!

Elasticsearch search for Turkish characters

Tags:

elasticsearch

Kursat Serolar

1 Answers

Byron Voorbach

Recent Activity

Donate For Us

Elasticsearch search for Turkish characters

Tags:

elasticsearch

Kursat Serolar

1 Answers

Byron Voorbach

Related questions

Recent Activity

Donate For Us