How to search for a part of a word with ElasticSearch

Tags:

elasticsearch

People also ask

How do I search for a word in Kibana?

To search for an exact string, you need to wrap the string in double quotation marks. Without quotation marks, the search in the example would match any documents containing one of the following words: "Cannot" OR "change" OR "the" OR "info" OR "a" OR "user".

Can Elasticsearch do joins?

Out of the box, Elasticsearch does not have joins as in an SQL database. While there are potential workarounds for establishing relationships in your documents, it is important to be aware of the challenges each of these approaches presents.

Is Elasticsearch good for full-text search?

Elasticsearch is a popular open source search engine. Because of its real-time speeds and robust API, it's a popular choice among developers that need to add full-text search capabilities in their projects.

How do you search in Elasticsearch?

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .

I'm using nGram, too. I use standard tokenizer and nGram just as a filter. Here is my setup:

{
  "index": {
    "index": "my_idx",
    "type": "my_type",
    "analysis": {
      "index_analyzer": {
        "my_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "mynGram"
          ]
        }
      },
      "search_analyzer": {
        "my_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "mynGram"
          ]
        }
      },
      "filter": {
        "mynGram": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 50
        }
      }
    }
  }
}

Let's you find word parts up to 50 letters. Adjust the max_gram as you need. In german words can get really big, so I set it to a high value.

Searching with leading and trailing wildcards is going to be extremely slow on a large index. If you want to be able to search by word prefix, remove leading wildcard. If you really need to find a substring in a middle of a word, you would be better of using ngram tokenizer.

I think there's no need to change any mapping. Try to use query_string, it's perfect. All scenarios will work with default standard analyzer:

We have data:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}
{"_id" : "2","name" : "Jane Doewoman","function" : "Teacher"}

Scenario 1:

{"query": {
    "query_string" : {"default_field" : "name", "query" : "*Doe*"}
} }

Response:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}
{"_id" : "2","name" : "Jane Doewoman","function" : "Teacher"}

Scenario 2:

{"query": {
    "query_string" : {"default_field" : "name", "query" : "*Jan*"}
} }

Response:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}

Scenario 3:

{"query": {
    "query_string" : {"default_field" : "name", "query" : "*oh* *oe*"}
} }

Response:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}
{"_id" : "2","name" : "Jane Doewoman","function" : "Teacher"}

EDIT - Same implementation with spring data elastic search https://stackoverflow.com/a/43579948/2357869

One more explanation how query_string is better than others https://stackoverflow.com/a/43321606/2357869

without changing your index mappings you could do a simple prefix query that will do partial searches like you are hoping for

ie.

{
  "query": { 
    "prefix" : { "name" : "Doe" }
  }
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html

Try the solution with is described here: Exact Substring Searches in ElasticSearch

{
    "mappings": {
        "my_type": {
            "index_analyzer":"index_ngram",
            "search_analyzer":"search_ngram"
        }
    },
    "settings": {
        "analysis": {
            "filter": {
                "ngram_filter": {
                    "type": "ngram",
                    "min_gram": 3,
                    "max_gram": 8
                }
            },
            "analyzer": {
                "index_ngram": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [ "ngram_filter", "lowercase" ]
                },
                "search_ngram": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    }
}

To solve the disk usage problem and the too-long search term problem short 8 characters long ngrams are used (configured with: "max_gram": 8). To search for terms with more than 8 characters, turn your search into a boolean AND query looking for every distinct 8-character substring in that string. For example, if a user searched for large yard (a 10-character string), the search would be:

"arge ya AND arge yar AND rge yard.

While there are a lot of answers which focuses on solving the issue at hand but don't talk much about the various trade-off which someone needs to make before choosing a particular answer. So let me try to add a few more details on this perspective.

Partial search is now a day a very common and important feature and if not implemented properly can lead to poor user experience and bad performance, so first know your application function and non-function requirement related to this feature which I talked about in my this detailed SO answer.

Now there are various approaches, like query time, index time, completion suggester and search as you type data-types added in recent version of elasticsarch.

Now people who quickly want to just implement a solution can use below end to end working solution.

Index mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    },
    "index.max_ngram_diff" : 10
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

Index given sample docs

{
  "title" : "John Doeman"
  
}

{
  "title" : "Jane Doewoman"
  
}

{
  "title" : "Jimmy Jackal"
  
}

And search query

{
    "query": {
        "match": {
            "title": "Doe"
        }
    }
}

which returns expected search results

 "hits": [
            {
                "_index": "6467067",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.76718915,
                "_source": {
                    "title": "John Doeman"
                }
            },
            {
                "_index": "6467067",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.76718915,
                "_source": {
                    "title": "Jane Doewoman"
                }
            }
        ]

Related questions
                            
                                How to stop/shut down an elasticsearch node?
                            
                                Import/Index a JSON file into Elasticsearch
                            
                                How to handle multiple heterogeneous inputs with Logstash?
                            
                                Connection Timeout with Elasticsearch
                            
                                Redis Vs RabbitMQ as a data broker/messaging system in between Logstash and elasticsearch
                            
                                Proper access policy for Amazon Elastic Search Cluster
                            
                                Elastic Search: how to see the indexed data
                            
                                how to move elasticsearch data from one server to another
                            
                                How do I enable remote access/request in Elasticsearch 2.0?
                            
                                no [query] registered for [filtered]
                            
                                No handler for type [string] declared on field [name]
                            
                                How to change Elasticsearch max memory size
                            
                                Elasticsearch vs Cassandra vs Elasticsearch with Cassandra
                            
                                Elasticsearch: Failed to connect to localhost port 9200 - Connection refused
                            
                                How do I escape characters in GitHub code search? [duplicate]
                            
                                No mapping found for field in order to sort on in ElasticSearch
                            
                                Elasticsearch: Difference between "Term", "Match Phrase", and "Query String"
                            
                                how to rename an index in a cluster?
                            
                                Elasticsearch: Max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
                            
                                What are some use cases for using Elasticsearch versus standard sql queries? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to search for a part of a word with ElasticSearch

Tags:

elasticsearch

People also ask

Related questions

Recent Activity

Donate For Us