Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match on prefix in Elasticsearch

let's say that in my elasticsearch index I have a field called "dots" which will contain a string of punctuation separated words (e.g. "first.second.third").

I need to search for e.g. "first.second" and then get all entries whose "dots" field contains a string being exactly "first.second" or starting with "first.second.".

I have a problem understanding how the text querying works, at least I have not been able to create a query which does the job.

like image 208
Stine Avatar asked Aug 24 '12 22:08

Stine


People also ask

What is phrase prefix in Elasticsearch?

Match phrase prefix queryedit. Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.

What is prefix query?

Prefix queryeditReturns documents that contain a specific prefix in a provided field.

What is match phrase in Elasticsearch?

Match phrase queryeditA phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.

How does match query work in Elasticsearch?

The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .


2 Answers

Elasticsearch has Path Hierarchy Tokenizer that was created exactly for such use case. Here is an example of how to set it for your index:

# Create a new index with custom path_hierarchy analyzer 
# See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
curl -XPUT "localhost:9200/prefix-test" -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "prefix-test-analyzer": {
                    "type": "custom",
                    "tokenizer": "prefix-test-tokenizer"
                }
            },
            "tokenizer": {
                "prefix-test-tokenizer": {
                    "type": "path_hierarchy",
                    "delimiter": "."
                }
            }
        }
    },
    "mappings": {
        "doc": {
            "properties": {
                "dots": {
                    "type": "string",
                    "analyzer": "prefix-test-analyzer",
                    //"index_analyzer": "prefix-test-analyzer", //deprecated
                    "search_analyzer": "keyword"
                }
            }
        }
    }
}'
echo
# Put some test data
curl -XPUT "localhost:9200/prefix-test/doc/1" -d '{"dots": "first.second.third"}'
curl -XPUT "localhost:9200/prefix-test/doc/2" -d '{"dots": "first.second.foo-bar"}'
curl -XPUT "localhost:9200/prefix-test/doc/3" -d '{"dots": "first.baz.something"}'
curl -XPOST "localhost:9200/prefix-test/_refresh"
echo
# Test searches. 
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
    "query": {
        "term": {
            "dots": "first"
        }
    }
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
    "query": {
        "term": {
            "dots": "first.second"
        }
    }
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
    "query": {
        "term": {
            "dots": "first.second.foo-bar"
        }
    }
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true&q=dots:first.second"
echo
like image 153
imotov Avatar answered Oct 01 '22 21:10

imotov


There is also a much easier way, as pointed out in elasticsearch documentation:

just use:

{
    "text_phrase_prefix" : {
        "fieldname" : "yourprefix"
    }
}

or since 0.19.9:

{
    "match_phrase_prefix" : {
        "fieldname" : "yourprefix"
    }
}

instead of:

{   
    "prefix" : { 
        "fieldname" : "yourprefix" 
}
like image 24
Macilias Avatar answered Oct 01 '22 19:10

Macilias