My ElasticSearch index is not correctly indexing and querying non-alphanumeric characters. Specifically, dots and dashes are causing problems.
If I index a document with the name "O.K. Corral," it should match queries for "OK Corral". Similarly, if I index "Whiskey A Go-Go," I'd like it to match "Whiskey A GoGo" and "Whiskey A Go Go".
Right now, only queries with the correct dots and dashes will return these documents.
I'm hoping the solution will also solve any potential problems with other non-alphanumeric characters, like commas and apostrophes.
It sounds like a job for ElasticSearch token filters, but I haven't been able to find one that does what I'm looking for. Also, I would like to do this within ElasticSearch -- I don't want to write custom string manipulations to normalize data before it gets to my ES index.
Thanks for your help!
You might want to have a look at the Word Delimiter Token Filter. It will at least do what you want with "Whiskey A GoGo" and "Whiskey A Go-Go,". You can check its behaviour in advance using the analyze api.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With