Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cloudsearch Fuzzy terms and phrases

I am trying to get my head around how fuzzy search works on AWS CloudSearch

I want to find "Star Wars" but in my search, I spell it

ster wers

The logic of my app will add fuzzy but it never returns Star Wars. I have tried:

ster~1 wers~1
"ster wers"~2
"ster"~1 "wers"~1

What am I missing here?

like image 832
dmo Avatar asked Mar 31 '15 11:03

dmo


People also ask

What is facet in CloudSearch?

A facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to Amazon CloudSearch, you can request facet information to find out how many documents share the same value in a particular field.

What is CloudSearch used for?

Google Cloud Search allows employees of a company to search and retrieve information, such as internal documents, database fields, and CRM data, from the company's internal data repositories.

What does Amazon CloudSearch enable you to do?

Amazon CloudSearch is a managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application. Amazon CloudSearch supports 34 languages and popular search features such as highlighting, autocomplete, and geospatial search.

Is CloudSearch based on SOLR?

Amazon CloudSearch now provides several popular search engine features available with Apache Solr in addition to the managed search service experience that makes it easy to set up, operate, and scale a search domain.


1 Answers

The reason your query doesn't work is because of how CloudSearch stems. If your field is indexed with the Analysis Scheme set to English, then wars will be stored in its stemmed form as war.

Here's a little demo of how stemming is affecting your query.

Searching with the un-stemmed query ('ster wers'):

Searching with the un-stemmed query requires you to match wers to war, which is off by 2 chars and requires this query: q=ster~1+wers~2.

Searching with the stemmed query ('ster wer'):

Searching with the stemmed version means you're matching wer to war and you're only off by 1 char. Thus ster~1 wer~1 will get the desired result (ie it matches star wars).

How to fix:

The use case you described will work if you configure the Analysis Scheme for the field in question to not use any stemming.

  1. To do this, log into the AWS Web Console and go to Analysis Schemes --> Add Analysis Scheme: enter image description here

  2. Then go to Indexing Options and configure your field to use your new no-stemming analysis scheme: enter image description here

  3. Submit your changes and re-index.

That will address your issue but of course you'll lose the benefits of stemming. You can't have your cake and eat it too.

like image 83
alexroussos Avatar answered Oct 04 '22 05:10

alexroussos