I have documents that I want to index in ElasticSearch that contains a text field called name
. I currently index the name using the snowball
analyzer. However, I would like to match names both with and without included spaces. For example, a document with the name "The Home Depot" should match "homedepot", "home", and "home depot". Additionally, documents with a single word name like "ExxonMobil" should match "exxon mobil" and "exxonmobil".
I can't seem to find the right combination of analyzer/filters to accomplish this.
The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .
Either the query_string query or the match query would be what you're looking for. query_string will use the special _all field if none is specified in default_field , so that would work out well. And with match you can just specify the _all as well. Save this answer.
Full-text search queries and performs linguistic searches against documents. It includes single or multiple words or phrases and returns documents that match search condition. ElasticSearch is a search engine based on Apache Lucene, a free and open-source information retrieval software library.
You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .
I think the most direct approach to this problem would be to apply a Shingle token filter, which, instead of creating ngrams of characters, creates combinations of incoming tokens. You can add it to your analyzer something like:
filter:
........
my_shingle_filter:
type: shingle
min_shingle_size: 2
max_shingle_size: 3
output_unigrams: true
token_separator: ""
you should be mindful of where this filter is placed in your filter chain. It should probably come late in the chain, after all token separation/removal/replacement has already occurred (ie. after any StopFilters, SynonymFilters, stemmers, etc).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With