Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch - Searching with hyphens in name

I have a product catalog which I am indexing in ElasticSearch using the Elastica client. I am very new to ElasticSearch BTW.

There are products in my catalog which have 't-shirt' in their names. But, they won't appear in search results if I type 'tshirt'.

What can I do so that 't-shirt' can also pop-up in results?

I have followed this tutorial and implemented the following for indexes:

'analysis' => array(
    'analyzer' => array(
        'indexAnalyzer' => array(
            'type' => 'custom',
            'tokenizer' => 'whitespace',
            'filter' => array('lowercase', 'mySnowball')
        ),
        'searchAnalyzer' => array(
            'type' => 'custom',
            'tokenizer' => 'whitespace',
            'filter' => array('lowercase', 'mySnowball')
        )
    ),
    'filter' => array(
        'mySnowball' => array(
            'type' => 'snowball',
            'language' => 'English'
        )
    )
)
like image 622
Hitesh Avatar asked May 13 '14 17:05

Hitesh


People also ask

How do you find a hyphen in Elasticsearch?

Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit. you can do it in the client front.

What is the difference between text and keyword in Elasticsearch?

The crucial difference between them is that Elasticsearch will analyze the Text before it's stored into the Inverted Index while it won't analyze Keyword type. Analyzed or not analyzed will affect how it will behave when getting queried.

What is type keyword in Elasticsearch?

Not all numeric data should be mapped as a numeric field data type. Elasticsearch optimizes numeric fields, such as integer or long , for range queries. However, keyword fields are better for term and other term-level queries. Identifiers, such as an ISBN or a product ID, are rarely used in range queries.


1 Answers

You can try removing the hyphen using a mapping char filter:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html

something like this would remove the hyphen:

{
    "index" : {
        "analysis" : {
            "char_filter" : {
                "my_mapping" : {
                    "type" : "mapping",
                    "mappings" : ["-=>"]
                }
            },
            "analyzer" : {
                "custom_with_char_filter" : {
                    "tokenizer" : "standard",
                    "char_filter" : ["my_mapping"]
                }
            }
        }
    }
}

it's something of a blunt force instrument as it will strip all hyphens but it should make "t-shirt" and "tshirt" match

like image 68
John Petrone Avatar answered Sep 30 '22 19:09

John Petrone