I am using a php library of elasticsearch to index and find documents in my website. This is the code for creating the index:
curl -XPUT 'http://localhost:9200/test/' -d '
{
"index": {
"numberOfShards": 1,
"numberOfReplicas": 1
}
}'
I then use curl XPUT to add documents to the index and XGET to query the index. This works well except for the fact that singulars and plurals of query words are not matched across the index while returning results. For example, when I search for "discussions", the matches for "discussion" are not returned and vice versa. Why is this so? I thought this is taken care of by default in elasticsearch. Is there anything that we have to mention explicitly for it to match the singular/plural forms?
If a search request results in more than ten hits, ElasticSearch will, by default, only return the first ten hits. To override that default value in order to retrieve more or fewer hits, we can add a size parameter to the search request body.
There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.
The size parameter is the maximum number of hits to return. Together, these two parameters define a page of results. response = client.
Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username. Avoid using the term query for text fields. By default, Elasticsearch changes the values of text fields as part of analysis.
The default elascticsearch analyzer doesn't do stemming and this is what you need to handle plural/singular. You can try using Snowball Analyzer for your text fields to see if it works better for your use case:
curl -XPUT 'http://localhost:9200/test' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
},
"mappings" : {
"page" : {
"properties" : {
"mytextfield": { "type": "string", "analyzer": "snowball", "store": "yes"}
}
}
}
}'
Somehow snowball is not working for me... am getting errors like I mentioned in the comment to @imotov's answer. I used porter stem and it worked perfectly for me. This is the config I used:
curl -XPUT localhost:9200/index_name -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"stem" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "stop", "porter_stem"]
}
}
}
},
"mappings" : {
"index_type_1" : {
"dynamic" : true,
"properties" : {
"field1" : {
"type" : "string",
"analyzer" : "stem"
},
"field2" : {
"type" : "string",
"analyzer" : "stem"
}
}
}
}
}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With