I have the following indexed document:
{ "visitor": { "id": <SOME STRING VALUE> } }
The mapping for the document is:
"visitor": { "properties": { "id": { "type": "string" } } }
When I run the following query I get results:
{ "query": { "filtered": { "query": { "match_all": {} } }, "filter": { "term": { "visitor.id": "123" } } } }
However this does not:
{ "query": { "filtered": { "query": { "match_all": {} } }, "filter": { "term": { "visitor.id": "ABC" } } } }
I've been thinking this is related to analyzers and have been chasing that down. I've also been wondering if I was wrong to use dot notation to get to the nested visitor property.
Can anyone tell me why I can't filter for the visitor with the id of "ABC" but can for visitor 123
Term query return documents that contain one or more exact term in a provided field. The terms query is the same as the term query, except you can search for multiple values. Warning: Avoid using the term query for text fields.
To better search text fields, the match query also analyzes your provided search term before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. The term query does not analyze the search term. The term query only searches for the exact term you provide.
Term queryedit. Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username. Avoid using the term query for text fields.
The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.
You need to understand how elasticsearch's analyzers work. Analyzers perform a tokenization (split an input into a bunch of tokens, such as on whitespace), and a set of token filters (filter out tokens you don't want, like stop words, or modify tokens, like the lowercase token filter which converts everything to lower case).
Analysis is performed at two very specific times - during indexing (when you put stuff into elasticsearch) and, depending on your query, during searching (on the string you're searching for).
That said, the default analyzer is the standard analyzer which consists of a standard tokenizer, standard token filter (to clean up tokens from the standard tokenizer), lowercase token filter, and stop words token filter.
To put this to an example, when you save the string "I love Vincent's pie!" into elasticsearch, and you're using the default standard analyzer, you're actually storing "i", "love", "vincent", "s", "pie". Then, when you attempt to search for "Vincent's" with a term
query (which is not analyzed), you will not find anything because "Vincent's" is not one of those tokens! However, if you search for "Vincent's" using a match
query (which is analyzed), you will find "I love Vincent's pie!" because "vincent" and "s" both find matches.
The bottom line, either:
match
, when searching natural language strings.You can set the field up to not use an analyzer with the following mapping, which should suit your needs:
"visitor": { "properties": { "id": { "type": "string" "index": "not_analyzed" } } }
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html for further reading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With