Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for a part of a word with ElasticSearch

People also ask

How do I search for a word in Kibana?

To search for an exact string, you need to wrap the string in double quotation marks. Without quotation marks, the search in the example would match any documents containing one of the following words: "Cannot" OR "change" OR "the" OR "info" OR "a" OR "user".

Can Elasticsearch do joins?

Out of the box, Elasticsearch does not have joins as in an SQL database. While there are potential workarounds for establishing relationships in your documents, it is important to be aware of the challenges each of these approaches presents.

Is Elasticsearch good for full-text search?

Elasticsearch is a popular open source search engine. Because of its real-time speeds and robust API, it's a popular choice among developers that need to add full-text search capabilities in their projects.

How do you search in Elasticsearch?

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .


I'm using nGram, too. I use standard tokenizer and nGram just as a filter. Here is my setup:

{
  "index": {
    "index": "my_idx",
    "type": "my_type",
    "analysis": {
      "index_analyzer": {
        "my_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "mynGram"
          ]
        }
      },
      "search_analyzer": {
        "my_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "mynGram"
          ]
        }
      },
      "filter": {
        "mynGram": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 50
        }
      }
    }
  }
}

Let's you find word parts up to 50 letters. Adjust the max_gram as you need. In german words can get really big, so I set it to a high value.


Searching with leading and trailing wildcards is going to be extremely slow on a large index. If you want to be able to search by word prefix, remove leading wildcard. If you really need to find a substring in a middle of a word, you would be better of using ngram tokenizer.


I think there's no need to change any mapping. Try to use query_string, it's perfect. All scenarios will work with default standard analyzer:

We have data:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}
{"_id" : "2","name" : "Jane Doewoman","function" : "Teacher"}

Scenario 1:

{"query": {
    "query_string" : {"default_field" : "name", "query" : "*Doe*"}
} }

Response:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}
{"_id" : "2","name" : "Jane Doewoman","function" : "Teacher"}

Scenario 2:

{"query": {
    "query_string" : {"default_field" : "name", "query" : "*Jan*"}
} }

Response:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}

Scenario 3:

{"query": {
    "query_string" : {"default_field" : "name", "query" : "*oh* *oe*"}
} }

Response:

{"_id" : "1","name" : "John Doeman","function" : "Janitor"}
{"_id" : "2","name" : "Jane Doewoman","function" : "Teacher"}

EDIT - Same implementation with spring data elastic search https://stackoverflow.com/a/43579948/2357869

One more explanation how query_string is better than others https://stackoverflow.com/a/43321606/2357869


without changing your index mappings you could do a simple prefix query that will do partial searches like you are hoping for

ie.

{
  "query": { 
    "prefix" : { "name" : "Doe" }
  }
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html


Try the solution with is described here: Exact Substring Searches in ElasticSearch

{
    "mappings": {
        "my_type": {
            "index_analyzer":"index_ngram",
            "search_analyzer":"search_ngram"
        }
    },
    "settings": {
        "analysis": {
            "filter": {
                "ngram_filter": {
                    "type": "ngram",
                    "min_gram": 3,
                    "max_gram": 8
                }
            },
            "analyzer": {
                "index_ngram": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [ "ngram_filter", "lowercase" ]
                },
                "search_ngram": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    }
}

To solve the disk usage problem and the too-long search term problem short 8 characters long ngrams are used (configured with: "max_gram": 8). To search for terms with more than 8 characters, turn your search into a boolean AND query looking for every distinct 8-character substring in that string. For example, if a user searched for large yard (a 10-character string), the search would be:

"arge ya AND arge yar AND rge yard.


While there are a lot of answers which focuses on solving the issue at hand but don't talk much about the various trade-off which someone needs to make before choosing a particular answer. So let me try to add a few more details on this perspective.

Partial search is now a day a very common and important feature and if not implemented properly can lead to poor user experience and bad performance, so first know your application function and non-function requirement related to this feature which I talked about in my this detailed SO answer.

Now there are various approaches, like query time, index time, completion suggester and search as you type data-types added in recent version of elasticsarch.

Now people who quickly want to just implement a solution can use below end to end working solution.

Index mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    },
    "index.max_ngram_diff" : 10
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

Index given sample docs

{
  "title" : "John Doeman"
  
}

{
  "title" : "Jane Doewoman"
  
}

{
  "title" : "Jimmy Jackal"
  
}

And search query

{
    "query": {
        "match": {
            "title": "Doe"
        }
    }
}

which returns expected search results

 "hits": [
            {
                "_index": "6467067",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.76718915,
                "_source": {
                    "title": "John Doeman"
                }
            },
            {
                "_index": "6467067",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.76718915,
                "_source": {
                    "title": "Jane Doewoman"
                }
            }
        ]