let's say that in my elasticsearch index I have a field called "dots" which will contain a string of punctuation separated words (e.g. "first.second.third").
I need to search for e.g. "first.second" and then get all entries whose "dots" field contains a string being exactly "first.second" or starting with "first.second.".
I have a problem understanding how the text querying works, at least I have not been able to create a query which does the job.
Match phrase prefix queryedit. Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.
Prefix queryeditReturns documents that contain a specific prefix in a provided field.
Match phrase queryeditA phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.
The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .
Elasticsearch has Path Hierarchy Tokenizer that was created exactly for such use case. Here is an example of how to set it for your index:
# Create a new index with custom path_hierarchy analyzer
# See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
curl -XPUT "localhost:9200/prefix-test" -d '{
"settings": {
"analysis": {
"analyzer": {
"prefix-test-analyzer": {
"type": "custom",
"tokenizer": "prefix-test-tokenizer"
}
},
"tokenizer": {
"prefix-test-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
},
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"analyzer": "prefix-test-analyzer",
//"index_analyzer": "prefix-test-analyzer", //deprecated
"search_analyzer": "keyword"
}
}
}
}
}'
echo
# Put some test data
curl -XPUT "localhost:9200/prefix-test/doc/1" -d '{"dots": "first.second.third"}'
curl -XPUT "localhost:9200/prefix-test/doc/2" -d '{"dots": "first.second.foo-bar"}'
curl -XPUT "localhost:9200/prefix-test/doc/3" -d '{"dots": "first.baz.something"}'
curl -XPOST "localhost:9200/prefix-test/_refresh"
echo
# Test searches.
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first.second"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first.second.foo-bar"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true&q=dots:first.second"
echo
There is also a much easier way, as pointed out in elasticsearch documentation:
just use:
{
"text_phrase_prefix" : {
"fieldname" : "yourprefix"
}
}
or since 0.19.9:
{
"match_phrase_prefix" : {
"fieldname" : "yourprefix"
}
}
instead of:
{
"prefix" : {
"fieldname" : "yourprefix"
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With