I am trying to get my head around how fuzzy search works on AWS CloudSearch
I want to find "Star Wars" but in my search, I spell it
ster wers
The logic of my app will add fuzzy but it never returns Star Wars. I have tried:
ster~1 wers~1
"ster wers"~2
"ster"~1 "wers"~1
What am I missing here?
A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many documents share the same value in a particular field.
Google Cloud Search allows employees of a company to search and retrieve information, such as internal documents, database fields, and CRM data, from the company's internal data repositories.
Amazon CloudSearch is a managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application. Amazon CloudSearch supports 34 languages and popular search features such as highlighting, autocomplete, and geospatial search.
Amazon CloudSearch now provides several popular search engine features available with Apache Solr in addition to the managed search service experience that makes it easy to set up, operate, and scale a search domain.
The reason your query doesn't work is because of how CloudSearch stems. If your field is indexed with the Analysis Scheme set to English
, then wars
will be stored in its stemmed form as war
.
Here's a little demo of how stemming is affecting your query.
Searching with the un-stemmed query ('ster wers'):
Searching with the un-stemmed query requires you to match wers
to war
, which is off by 2 chars and requires this query: q=ster~1+wers~2
.
Searching with the stemmed query ('ster wer'):
Searching with the stemmed version means you're matching wer
to war
and you're only off by 1 char. Thus ster~1 wer~1
will get the desired result (ie it matches star wars
).
How to fix:
The use case you described will work if you configure the Analysis Scheme for the field in question to not use any stemming.
To do this, log into the AWS Web Console and go to Analysis Schemes --> Add Analysis Scheme:
Then go to Indexing Options and configure your field to use your new no-stemming analysis scheme:
Submit your changes and re-index.
That will address your issue but of course you'll lose the benefits of stemming. You can't have your cake and eat it too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With