Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch analyzer - lowercase and whitespace tokenizer

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?

This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...

{
  "mappings": {
    "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
      }
    }
  }
}

Please help...

like image 499
user3658423 Avatar asked Dec 13 '14 02:12

user3658423


People also ask

What is whitespace tokenizer in Elasticsearch?

The whitespace tokenizer breaks text into terms whenever it encounters a whitespace character.

What is analyzer and tokenizer in Elasticsearch?

In ElasticSearch , analyzer is a combination of. Character filter : "tidy up" a string before it is tokenized e.g. remove HTML tags. Tokenizer : It's used to break up the string into individual terms or tokens. Must have 1 only. Token filter : change, add or remove tokens.

What is whitespace tokenizer?

A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters. This implementation can return Word, CoreLabel or other LexedToken objects. It has a parameter for whether to make EOL a token or whether to treat EOL characters as whitespace.

What is the use of tokenizer in Elasticsearch?

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!] .


1 Answers

i managed to write a custom analyzer and this works...

"settings":{
  "analysis": {
    "analyzer": {
      "lowercasespaceanalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase"
        ]
      }
    }
  }
},


"mappings": {
 "my_type" : {
  "properties" : {
    "title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [
      "lowercase"
    ] }
  }
 }
}
like image 144
user3658423 Avatar answered Sep 27 '22 20:09

user3658423