I've created this test index using marvel plugin:
POST /test
{
"index" : {
"analysis" : {
"analyzer" : {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
And I'm making the analyze request like this:
GET /test/_analyze?analyzer=folding&text=olá
And I'm getting the result:
{
"tokens": [
{
"token": "ol",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 1
}
]
}
But I need to have an "ola" token instead of the "ol" only. According to the documentation it's properly configured:
https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html
What am I doing wrong?
Try this, to prove that Elasticsearch does a good job in the end. I suspect the Sense interface is not passing the correct text to the analyzer.
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
POST /my_index/test/1
{
"text": "olá"
}
GET /my_index/test/_search
{
"fielddata_fields": ["text"]
}
The result:
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_indexxx",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"text": "olá"
},
"fields": {
"text": [
"ola"
]
}
}
]
}
Using the more recent versions of elasticsearch, you can pass "asciifolding"
token filter like this:
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
}
Sample query:
POST index_name/_analyze
{
"analyzer": "my_analyzer",
"text": "olá"
}
Output:
{
"tokens": [
{
"token": "ola",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
}
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With