How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?
This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...
{
"mappings": {
"my_type" : {
"properties" : {
"title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
}
}
}
}
Please help...
The whitespace tokenizer breaks text into terms whenever it encounters a whitespace character.
In ElasticSearch , analyzer is a combination of. Character filter : "tidy up" a string before it is tokenized e.g. remove HTML tags. Tokenizer : It's used to break up the string into individual terms or tokens. Must have 1 only. Token filter : change, add or remove tokens.
A WhitespaceTokenizer is a tokenizer that splits on and discards only whitespace characters. This implementation can return Word, CoreLabel or other LexedToken objects. It has a parameter for whether to make EOL a token or whether to treat EOL characters as whitespace.
A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!] .
i managed to write a custom analyzer and this works...
"settings":{
"analysis": {
"analyzer": {
"lowercasespaceanalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"my_type" : {
"properties" : {
"title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [
"lowercase"
] }
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With