Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analyze similarities in model data using Elasticsearch and Rails

I would like to use Elasticsearch to analyze data and display it to the user.

When a user views a record for a model, I want to display a list of 'similar' records in the database for that model, and the percentage of similarity. This would match against every field on the model.

I am aware that with the Searchkick gem I can use a command to find similar records:

product = Product.first
product.similar(fields: ["name"], where: {size: "12 oz"})

I would like to take this further and compare entire records (and eventually associations).

Is this feasible with Elasticsearch / Searchkick in Rails, or should I use another method to analyze the data?

like image 453
Drew Avatar asked Oct 19 '22 14:10

Drew


1 Answers

There is a feature built exactly for this purpose in Elasticsearch called more_like_this. The documentation for the mlt query goes into great details about how you can achieve exactly what you want to do.

The content you provide to the like field will be analyzed and the most relevant terms for each field will be used to retrieve documents with as many of those relevant terms. If you have all your records stored in Elasticsearch, you can use the Multi GET syntax to specify a document already in your index as content of the like field like this:

    "like" : [
      {
        "_index" : "model",
        "_type" : "model",
        "_id" : "1"
      }
    ]

Remember that you cannot use index aliases when using this syntax (so you'll have to do a document lookup first if you are not sure which index your document is currently residing in).

If you don't specify the fields field, all fields in the source document will be used. My suggestion to avoid bad surprises, is to always specify the list of fields you want your similar documents to match.

If you have non-textual fields that you want to match perfectly with the source document, you might want to consider using a bool query, programmatically creating the filter section to limit documents returned by the mlt query to only a filtered subset of your entire index.

You can build these queries in Searchkick using the advanced search feature, manually specifying the body of search requests.

like image 82
Michele Palmia Avatar answered Oct 29 '22 20:10

Michele Palmia