Elasticsearch - Analyser creating the right tokens but query is not matching

Tags:

elasticsearch

I'm trying to make Elasticsearch ignore hyphens. I don't want it to split either side of the hyphen into seperate words. It seems simple but I'm banging my head on the wall.

I want the string "Roland JD-Xi" to produce the following terms: [ roland jd-xi, roland, jd-xi, jdxi, roland jdxi ]

I haven't been able to achieve this easily. Most people will just type 'jdxi' so my initial thought would be to just remove the hyphen. So I'm using the following definition

  name: {
"type": "string",
"analyzer": "language",
"include_in_all": true,
"boost": 5,
"fields": {
    "my_standard": {
        "type": "string",
        "analyzer": "my_standard"
    },
    "my_prefix": {
        "type": "string",
        "analyzer": "my_text_prefix",
        "search_analyzer": "my_standard"
    },
    "my_suffix": {
        "type": "string",
        "analyzer": "my_text_suffix",
        "search_analyzer": "my_standard"
    }
}

}

And the relevant analyser and filters are defined as

{
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
    "analyzer": {
        "std": {
            "tokenizer": "standard",
            "char_filter": "html_strip",
            "filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "length", "strip_hyphens"]
        ...
        "my_text_prefix": {
            "tokenizer": "whitespace",
            "char_filter": "my_filter",
            "filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_front"]
        },
        "my_text_suffix": {
            "tokenizer": "whitespace",
            "char_filter": "my_filter",
            "filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_back"]
        },
        "my_standard": {
            "type": "custom",
            "tokenizer": "whitespace",
            "char_filter": "my_filter",
            "filter": ["standard", "elision", "asciifolding", "lowercase"]
        }
    },
    "char_filter": {
        "my_filter": {
            "type": "mapping",
            "mappings": ["- => ", ". => "]
        }
    },
    "filter": {
        "edge_ngram_front": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 20,
            "side": "front"
        },
        "edge_ngram_back": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 20,
            "side": "back"
        },
        "strip_spaces": {
            "type": "pattern_replace",
            "pattern": "\\s",
            "replacement": ""
        },
        "strip_dots": {
            "type": "pattern_replace",
            "pattern": "\\.",
            "replacement": ""
        },
        "strip_hyphens": {
            "type": "pattern_replace",
            "pattern": "-",
            "replacement": ""
        },
        "stop": {
            "type": "stop",
            "stopwords": "_none_"
        },
        "length": {
            "type": "length",
            "min": 1
        }
    }
}

I've been able to test (i.e. _analyze) this and the string "Roland JD-Xi" is tokenised as [ roland, jdxi ]

It not exactly what I want but close enough as it should match 'jdxi'.

But thats my problem. If I do a simple "index/_search?q=jdxi" it doesn't bring back the document. However if I do a "index/_search?q=roland+jdxi" it does bring back the document.

So at least I know the hyphens are being removed but if the tokens "roland" and "jdxi" are being created how come "index/_search?q=jdxi" doesn't match the document?

Is my problem with the index process or the query process?
How do I fix it?
Can anyone explain how to achieve the desired tokens [ roland jd-xi, roland, jd-xi, jdxi, roland jdxi ]

373

asked Mar 21 '18 11:03

1 Answers

I've reproduced your case on ES 6 and searching for index/_search?q=jdxi returns the document.

The issue could be that when searching for index/_search?q=jdxi without specifying a field, it will basically search in _all which contains whatever was in the name field (basically the same as index/_search?q=name:jdxi). Since that field was not analyzed using your my_standard analyzer, you don't get any results.

What you should do instead is searching using the my_standard sub-field, i.e. index/_search?q=name.my_standard:jdxi and pretty sure you'll get the document.

114

answered Sep 28 '22 16:09

Val

Related questions
                            
                                NHibernate-based Full-Text Search
                            
                                Elastic Search and "sub queries"
                            
                                elasticsearch - how to copy data to another cluster
                            
                                Automatic id generation and mapping _id NEST
                            
                                Elasticsearch field name case sensitive
                            
                                elasticsearch - breaking english compound words?
                            
                                Data model for fields that change frequently in ElasticSearch
                            
                                Getting IP address of Logstash-forwarder machine
                            
                                Elasticsearch get size stats of document given document Id
                            
                                In ElasticSearch, how does sort interact with function_score?
                            
                                Passing dynamic value to script query in Elastic Search
                            
                                how to configure Jira Dashboard in Kibana
                            
                                Elasticsearch document id type integer vs string : Is there any performace difference?
                            
                                ElasticSearch: compare dotted version strings
                            
                                Elasticsearch NoNodeAvailableException None of the configured nodes are available
                            
                                Laravel Scout - observe relations
                            
                                ElasticSearch as EventStore
                            
                                ElasticSearch - different result ordering for simple request and aggregation request (NEST)
                            
                                elasticsearch doc['...'] Arrays and order
                            
                                JestClient is throwing SocketTimeoutException after being idle for sometime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Elasticsearch - Analyser creating the right tokens but query is not matching

Tags:

elasticsearch

user2023210

People also ask

1 Answers

Val

Recent Activity

Donate For Us