Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch completion - generating input list with analyzers

I've had a look at this article: https://www.elastic.co/blog/you-complete-me However, it requires writing some logic in the client to create multiple "input". Is there a way to define an analyzer (maybe using shingle or ngram/edge-ngram) that will generate the multiple terms for input?

Here's what I tried (and it obviously doesn't work):

DELETE /products/
PUT /products/
{
    "settings": {
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type":"shingle",
                    "max_shingle_size":5,
                    "min_shingle_size":2
                }
            },
            "analyzer": {
                "autocomplete": {
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ],
                    "tokenizer": "standard"
                }
            }
        }
    }, 
    "mappings": {
        "product": {
            "properties": {
                "name": {"type": "string"
                ,"copy_to": ["name_suggest"]
                }
                ,"name_suggest": {
                    "type": "completion",
                    "payloads": false,
                    "analyzer": "autocomplete"
                }
            }
        }
    }
}

PUT /products/product/1
{
    "name": "Apple iPhone 5"
}

PUT /products/product/2
{
    "name": "iPhone 4 16GB"
}

PUT /products/product/3
{
    "name": "iPhone 3 GS 16GB black"
}

PUT /products/product/4
{
    "name": "Apple iPhone 4 S 16 GB white"
}

PUT /products/product/5
{
    "name": "Apple iPhone case"
}

POST /products/_suggest
{
    "suggestions": {
        "text":"i"
        ,"completion":{
            "field": "name_suggest"
        }
    }
}
like image 526
Lewis Diamond Avatar asked Jul 16 '15 16:07

Lewis Diamond


People also ask

How does Elasticsearch implement autocomplete?

Autocomplete can be achieved by changing match queries to prefix queries. While match queries work on token (indexed) to token (search query tokens) match, prefix queries (as their name suggests) match all the tokens starting with search tokens, hence the number of documents (results) matched is high.

What is difference between analyzer and Tokenizer in Elasticsearch?

Elasticsearch analyzers and normalizers are used to convert text into tokens that can be searched. Analyzers use a tokenizer to produce one or more tokens per text field. Normalizers use only character filters and token filters to produce a single token.

What are analyzers in Elasticsearch?

In a nutshell an analyzer is used to tell elasticsearch how the text should be indexed and searched. And what you're looking into is the Analyze API, which is a very nice tool to understand how analyzers work. The text is provided to this API and is not related to the index.


1 Answers

Don't think there's a direct way to achieve this. I'm not sure why it would be needed to store ngrammed tokens considering elasticsearch already stores the 'input' text as an FST structure. New releases also allow for fuzziness in the suggest query. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html#fuzzy

I can understand the need for something like a shingle analyser to generate the inputs for you, but there doesn't seem to be a way yet. Having said that, the _analyze endpoint can be used to generate tokens from the analyzer of your choice and those tokens can be passed to the 'input' field (with or without any other added logic). This way you won't have to replicate your analyzer logic in your application code. That's the only way i can think of to achieve the desired input field.

Hope it helps.

like image 101
Archit Saxena Avatar answered Nov 02 '22 16:11

Archit Saxena