Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch - Searching with hyphens

Elastic Search 1.6

I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a "Simple Query String" query to search on them.

Data sample (simplified):

{"title":"U-12 Soccer",
 "comment": "the t-shirts are dirty"}

As there are quite a lot of questions already about hyphens, I tried the following solution already:

Use a Char filter: ElasticSearch - Searching with hyphens in name.

So I went for this mapping:

{
  "settings":{
    "analysis":{
      "char_filter":{
        "myHyphenRemoval":{
          "type":"mapping",
          "mappings":[
            "-=>"
          ]
        }
      },
      "analyzer":{
        "default":{
          "type":"custom",
          "char_filter":  [ "myHyphenRemoval" ],
          "tokenizer":"standard",
          "filter":[
            "standard",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings":{
    "test":{
      "properties":{
        "title":{
          "type":"string"
        },
        "comment":{
          "type":"string"
        }
      }
    }
  }
}

Searching is done with the following query:

{"_source":true,
  "query":{
    "simple_query_string":{
      "query":"<Text>",
      "default_operator":"AND"
    }
  }
}
  1. What works:

    "U-12", "U*", "t*", "ts*"

  2. What didn't work:

    "U-*", "u-1*", "t-*", "t-sh*", ...

So it seems the char filter is not executed on search strings? What could I do to make this work?

like image 430
Roeland Van Heddegem Avatar asked Jun 18 '15 13:06

Roeland Van Heddegem


People also ask

How do I perform a search in Elasticsearch?

To search data in Elasticsearch index or alias double-click the index or alias in the Elasticsearch tool window or open the console and write search request manually. To execute the request click or press ⌃ ⏎. If the request succeeds the search response panel is shown.

What happens after indexing Documents in Elasticsearch?

After indexing, you can search, sort, and filter complete documents—not rows of columnar data. This is a fundamentally different way of thinking about data and is one of the reasons ElasticSearch can perform a complex full-text search. Documents are represented as JSON objects.

What are the security features of Elasticsearch?

Returns search hits that match the query defined in the request. If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias.

Why is Elasticsearch so slow?

If collection isn’t finished when the period ends, Elasticsearch uses only the hits accumulated up to that point. The overall latency of a search request depends on the number of shards needed for the search and the number of concurrent shard requests.


2 Answers

If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _ when indexing data.

For eg, O-000022334 should indexed as O_000022334.

When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.

like image 59
Jesal Avatar answered Sep 20 '22 08:09

Jesal


The answer is really simple:

Quote from Igor Motov: Configuring the standard tokenizer

By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:

{
  "_source":true,
  "query":{
    "simple_query_string":{
      "query":"u-1*",
      "analyze_wildcard":true,
      "default_operator":"AND"
    }
  }
}
like image 43
Roeland Van Heddegem Avatar answered Sep 20 '22 08:09

Roeland Van Heddegem