I have a product catalog which I am indexing in ElasticSearch using the Elastica client. I am very new to ElasticSearch BTW.
There are products in my catalog which have 't-shirt'
in their names. But, they won't appear in search results if I type 'tshirt'
.
What can I do so that 't-shirt'
can also pop-up in results?
I have followed this tutorial and implemented the following for indexes:
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'custom',
'tokenizer' => 'whitespace',
'filter' => array('lowercase', 'mySnowball')
),
'searchAnalyzer' => array(
'type' => 'custom',
'tokenizer' => 'whitespace',
'filter' => array('lowercase', 'mySnowball')
)
),
'filter' => array(
'mySnowball' => array(
'type' => 'snowball',
'language' => 'English'
)
)
)
Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit. you can do it in the client front.
The crucial difference between them is that Elasticsearch will analyze the Text before it's stored into the Inverted Index while it won't analyze Keyword type. Analyzed or not analyzed will affect how it will behave when getting queried.
Not all numeric data should be mapped as a numeric field data type. Elasticsearch optimizes numeric fields, such as integer or long , for range queries. However, keyword fields are better for term and other term-level queries. Identifiers, such as an ISBN or a product ID, are rarely used in range queries.
You can try removing the hyphen using a mapping char filter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html
something like this would remove the hyphen:
{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["-=>"]
}
},
"analyzer" : {
"custom_with_char_filter" : {
"tokenizer" : "standard",
"char_filter" : ["my_mapping"]
}
}
}
}
}
it's something of a blunt force instrument as it will strip all hyphens but it should make "t-shirt" and "tshirt" match
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With