ElasticSearch - Searching with hyphens

Tags:

Elastic Search 1.6

I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a "Simple Query String" query to search on them.

Data sample (simplified):

{"title":"U-12 Soccer",
 "comment": "the t-shirts are dirty"}

As there are quite a lot of questions already about hyphens, I tried the following solution already:

Use a Char filter: ElasticSearch - Searching with hyphens in name.

So I went for this mapping:

{
  "settings":{
    "analysis":{
      "char_filter":{
        "myHyphenRemoval":{
          "type":"mapping",
          "mappings":[
            "-=>"
          ]
        }
      },
      "analyzer":{
        "default":{
          "type":"custom",
          "char_filter":  [ "myHyphenRemoval" ],
          "tokenizer":"standard",
          "filter":[
            "standard",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings":{
    "test":{
      "properties":{
        "title":{
          "type":"string"
        },
        "comment":{
          "type":"string"
        }
      }
    }
  }
}

Searching is done with the following query:

{"_source":true,
  "query":{
    "simple_query_string":{
      "query":"<Text>",
      "default_operator":"AND"
    }
  }
}

What works:

"U-12", "U*", "t*", "ts*"
What didn't work:

"U-*", "u-1*", "t-*", "t-sh*", ...

So it seems the char filter is not executed on search strings? What could I do to make this work?

430

asked Jun 18 '15 13:06

Roeland Van Heddegem

2 Answers

If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _ when indexing data.

For eg, O-000022334 should indexed as O_000022334.

When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.

answered Sep 20 '22 08:09

Jesal

The answer is really simple:

Quote from Igor Motov: Configuring the standard tokenizer

By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:

{
  "_source":true,
  "query":{
    "simple_query_string":{
      "query":"u-1*",
      "analyze_wildcard":true,
      "default_operator":"AND"
    }
  }
}

answered Sep 20 '22 08:09

Roeland Van Heddegem

Related questions
                            
                                How to use Python Elasticsearch mget() API
                            
                                How to use slash ('/') in Kibana Discovery?
                            
                                elasticsearch QueryBuilder with dynamic list value in term query
                            
                                Elasticsearch average over date histogram buckets
                            
                                Amazon Elasticsearch Service: Authorization header issue when calling ES domain via proxy
                            
                                ElasticSearch: No filter registered for [match]]
                            
                                JSON.net deserialize object nested data
                            
                                Elasticsearch error failed to connect to master - no route to host
                            
                                Update By Query in Elasticsearch using Java
                            
                                What is The Z at the end of Date in Elasticsearch
                            
                                Elasticsearch aggregation using Java api
                            
                                How to pass current date to a curl query using shell script?
                            
                                ElasticSearch 2.0 Nest Unit Testing with MOQ
                            
                                ElasticSearch circuit_breaking_exception (Data too large) with significant_terms aggregation
                            
                                Elastic Search (COUNT*) with group by and where condition
                            
                                Elasticsearch on AWS: How to fix unassigned shards?
                            
                                How to fix this in error rails "warning: 299 Elasticsearch built-in security features are not enabled. Without authentication, your cluster could..."
                            
                                Elasticsearch term query does not give any results
                            
                                Spring Boot with Elastic Search causing java.lang.NoSuchFieldError: IGNORE_DEPRECATIONS
                            
                                curl -X POST -d @mapping.json + mapping not created

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ElasticSearch - Searching with hyphens

Tags:

mapping

elasticsearch

hyphen

Roeland Van Heddegem

People also ask

2 Answers

Jesal

Roeland Van Heddegem

Recent Activity

Donate For Us