I have looked at every article and post I could find about performing exact-match, case-insensitive queries, but upon implementation, they do not perform what I am looking for.
Before you mark this question as a duplicate, please read the entire post.
Given a username, I want to query my Elasticsearch database to only return a document that exactly matches the username, but is also case insensitive.
I have tried specifying a lowercase
analyzer for my username
property and use a match
query to implement this behavior. While this solves the problem of case insensitive matching, it fails at exact matching.
I looked into using a lowercase
normalizer, but that would make all of my usernames lowercase before indexing, so when I aggregate the usernames, they would return in lowercase form, which is not what I want. I need to preserve the original case of each letter in the username.
POST {elastic}/users/_doc
{
"email": "[email protected]",
"username": "UsErNaMe",
"password": "1234567"
}
This document will be stored in an index called users
exactly the way it is.
GET {frontend}/user/UsErNaMe
should return
{
"email": "[email protected]",
"username": "UsErNaMe",
"password": "1234567"
}
and
GET {frontend}/user/username
should return
{
"email": "[email protected]",
"username": "UsErNaMe",
"password": "1234567"
}
and
GET {frontend}/user/USERNAME
should return
{
"email": "[email protected]",
"username": "UsErNaMe",
"password": "1234567"
}
and
GET {frontend}/user/UsErNaMe $RaNdoM LeTteRs
should NOT return anything.
Thank you.
To achieve case insensitive exact match you need to define you own analyzer. The analyzer need to perform two actions:
The above two can be achieve by:
lowercase
filter when defining custom analyzer.tokenizer
to keyword
, this will make sure to generate single token of the input value after lowercase filter is applied.Now this custom analyzer can be applied to a text field where case insensitive exact search is required.
So to create index you can use below:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"username": {
"type": "text",
"analyzer": "case_insensitive_analyzer"
},
"password": {
"type": "keyword"
}
}
}
}
}
In the above case_insensitive_analyzer
is the required analyzer and as you can see it is applied on username
field.
So when you index a document as below:
PUT test/_doc/1
{
"email": "[email protected]",
"username": "UsErNaMe",
"password": "1234567"
}
for the field username
the input is UsErNaMe
. The analyzer first applies lowercase
filter on the input UsErNaMe
resulting into the value username
. Now on this value username
it applies keyword
tokenizer which does nothing but output the value obtained after applying filter(s), as a single token i.e. username
.
Now you can use match query as below to search against user name field:
GET test/_doc/_search
{
"query": {
"match": {
"username": "USERNAME"
}
}
}
Using above will give you desired output. Replace USERNAME
in above query to username
or UsErNaMe
or USERname
all will match the document. The reason for this is that while searching if no analyser is explicitly specified, elasticsearch uses the analyzer applied to the field while indexing. In the above case when searching against field username
, case_insensitive_analyzer
will be applied to input value i.e. USERNAME
which will result in token username
and hence the match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With