Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch not returning singular/plural matches

I am using a php library of elasticsearch to index and find documents in my website. This is the code for creating the index:

curl -XPUT 'http://localhost:9200/test/' -d '
{
  "index": {
    "numberOfShards": 1,
    "numberOfReplicas": 1
  }
}'

I then use curl XPUT to add documents to the index and XGET to query the index. This works well except for the fact that singulars and plurals of query words are not matched across the index while returning results. For example, when I search for "discussions", the matches for "discussion" are not returned and vice versa. Why is this so? I thought this is taken care of by default in elasticsearch. Is there anything that we have to mention explicitly for it to match the singular/plural forms?

like image 404
Ninja Avatar asked Nov 09 '11 12:11

Ninja


People also ask

How do I get more than 10 results in ElasticSearch?

If a search request results in more than ten hits, ElasticSearch will, by default, only return the first ten hits. To override that default value in order to retrieve more or fewer hits, we can add a size parameter to the search request body.

How do I capture a specific field in ElasticSearch?

There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.

What is ElasticSearch query size?

The size parameter is the maximum number of hits to return. Together, these two parameters define a page of results. response = client.

What is ElasticSearch term query?

Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username. Avoid using the term query for text fields. By default, Elasticsearch changes the values of text fields as part of analysis.


2 Answers

The default elascticsearch analyzer doesn't do stemming and this is what you need to handle plural/singular. You can try using Snowball Analyzer for your text fields to see if it works better for your use case:

curl -XPUT 'http://localhost:9200/test' -d '{
    "settings" : {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 1
        }
    },
    "mappings" : {
        "page" : {
            "properties" : {
                "mytextfield": { "type": "string",  "analyzer": "snowball", "store": "yes"}
            }
        }
    }
}'
like image 176
imotov Avatar answered Oct 02 '22 08:10

imotov


Somehow snowball is not working for me... am getting errors like I mentioned in the comment to @imotov's answer. I used porter stem and it worked perfectly for me. This is the config I used:

curl -XPUT localhost:9200/index_name -d '
{
"settings" : {
    "analysis" : {
        "analyzer" : {
            "stem" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "stop", "porter_stem"]
            }
        }
    }
},
"mappings" : {
    "index_type_1" : {
        "dynamic" : true,
        "properties" : {
            "field1" : {
                "type" : "string",
                "analyzer" : "stem"
            },
            "field2" : {
                "type" : "string",
                "analyzer" : "stem"
            }
         }
      }
   }
}'
like image 38
Ninja Avatar answered Oct 02 '22 08:10

Ninja