Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use ElasticSearch-Rails query dsl to return related relationships

Tags:

I am new to ElasticSearch, but need to use it to return a list of products. Please do not include answers or links to old answers which reference the deprecated tire gem.

gemfile

ruby '2.2.0' gem 'rails', '4.0.3' gem 'elasticsearch-model', '~> 0.1.6' gem 'elasticsearch-rails', '~> 0.1.6' 

I have a couple models with relationships. I included the relationships below.

Models and Relationships

product.rb include Searchable

  belongs_to :family   belongs_to :collection   has_many :benefits_products   has_many :benefits, :through => :benefits_products    def as_indexed_json(options={})     as_json(         include: {:benefits => { :only => [ :id, :name ] },                   :categories => { :only => [ :id, :name ] } }     )   end 

collection.rb

  include Searchable    has_many :products    def as_indexed_json(options={})     as_json(       include: [:products]     )   end 

family.rb

  include Searchable    has_many :products    def as_indexed_json(options={})     as_json(       include: [:products]     )   end 

benefit.rb

  include Searchable    has_many :benefits_products   has_many :products, :through => :benefits_products    def as_indexed_json(options={})     as_json(       include: [:products]     )   end 

Serachable.rb Is just a concern that includes Elastic search and callbacks in all models

module Searchable   extend ActiveSupport::Concern    included do     include Elasticsearch::Model     include Elasticsearch::Model::Callbacks      settings index: { number_of_shards: 1, number_of_replicas: 0 } do       mapping do          indexes :id, type: 'long'         indexes :name, type: 'string'         indexes :family_id, type: 'long'         indexes :collection_id, type: 'long'         indexes :created_at, type: 'date'         indexes :updated_at, type: 'date'          indexes :benefits, type: 'nested' do           indexes :id, type: 'long'           indexes :name, type: 'string'         end          indexes :categories, type: 'nested' do           indexes :id, type: 'long'           indexes :name, type: 'string'         end        end     end      def self.search(options={})       __set_filters = lambda do |key, f|          @search_definition[:filter][:and] ||= []         @search_definition[:filter][:and]  |= [f]       end        @search_definition = {         query: {           filtered: {             query: {               match_all: {}             }           }         },         filter: {}       }        if options[:benefits]         f = { term: { "benefits.id": options[:benefits] } }          __set_filters.(:collection_id, f)         __set_filters.(:family_id, f)         __set_filters.(:categories, f)       end        def as_indexed_json(options={})         as_json(           include: {:benefits => { :only => [ :id, :name ] },                     :categories => { :only => [ :id, :name ] } }         )       end        if options[:categories]         ...       end        if options[:collection_id]         ...       end        if options[:family_id]         ...       end        __elasticsearch__.search(@search_definition)     end    end end 

ElasticSearch

I breakdown dash separated slugs into the various families, collections and benefits. I am able to search for products with a specific family or collection and return correct results. I am also able to return results for one benefit, but they don't appear to be accurate. Also searching multiple benefits yields strange results. I would like the "AND" combination of all fields search, but my result doesnt seem to be the result of "AND" or "OR". So this is confusing me as well.

What do I pass to the Product.search method to yield desired results?

Thanks for any help you can provide!

Edit

I have now verified that benefits are indexed on the products. I used curl -XGET 'http://127.0.0.1:9200/products/_search?pretty=1' which produced a json response that looked like this:

{   "id":4,   "name":"product name"   "family_id":16   "collection_id":6   "created_at":"2015-04-13T12:49:42.000Z"   "updated_at":"2015-04-13T12:49:42.000Z"   "benefits":[     {"id":2,"name":"my benefit 2"},     {"id":6,"name":"my benefit 6"},     {"id":7,"name":"my benefit 7"}   ],   "categories":[     {"id":2,"name":"category 2"}   ]} }, {...} 

Now I just need to figure out how to search for the product with benefits 2,6, AND 7 in ElasticSearch if I wanted the above example product. I am specifically looking for the syntax to submit to the elasticsearch #search method to acquire the results of a nested "AND" query, nested query setup/mappings (to make sure I have not missed anything, and any other relevant info you can think of you troubleshoot this.

Upated

The Searchable concern has been updated to reflect the answer received. I translated the mapping json object to fit in the elasticsearch-model syntax. My remaining confusion occurs when I attempt to translate the query in a similar fashion.

Second Update

I am basic most of my searchable.rb concern off the elasticsearch-rails example app. I have updated searchable.rb to reflect this code, and while I am getting results, they are not the result of an "AND" execution. When I apply two benefits, I get the results from all products that have either benefit.

like image 862
Thomas Avatar asked Apr 10 '15 14:04

Thomas


People also ask

For what purpose is query DSL used in Elasticsearch?

Query DSLedit. Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses: Leaf query clauses.

How does Elasticsearch integrate with rails?

To get it running: install all gems, create the database (SQLite 3), run an elasticsearch instance in the background and seed the Application with the provided Wikipedia articles within the /db/seed/ directory. After you've got it running go to localhost:3000 and try searching for ruby or language.

What is an Elasticsearch query?

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

What is chewy gem?

Unlike other clients, the Chewy gem removes the need to manually implement index classes, data import callbacks, and other components. Bulk import is everywhere. Chewy utilizes the bulk Elasticsearch API for full reindexing and index updates.


1 Answers

By default if you use dynamic mapping to load the data, then ES will create nested objects as flat objects and hence will loose the relation between the various nested properties. To maintain the proper relations we can use either nested objects or parent-child relations.

Now i will use nested objects to achieve the desired result:

Mapping:

PUT /index-3 {   "mappings": {     "products":{       "properties": {         "id": {           "type": "long"         },         "name":{           "type": "string"         },         "family_id":{           "type": "long"         },         "collection_id":{           "type": "long"         },         "created_at":{           "type": "date"         },         "updated_at":{           "type": "date"         },         "benefits":{           "type": "nested",           "include_in_parent": true,           "properties": {             "id": {               "type": "long"             },             "name":{               "type":"string"             }           }         },         "categories":{           "type": "nested",           "include_in_parent": true,           "properties": {             "id":{               "type": "long"             },             "name":{               "type":"string"             }           }         }       }     }   } } 

If you observe i have treated the children objects as nested mapping and included in the parent.

Now some sample data:

PUT /index-3/products/4 {   "name":"product name 4",   "family_id":15,   "collection_id":6,   "created_at":"2015-04-13T12:49:42.000Z",   "updated_at":"2015-04-13T12:49:42.000Z",   "benefits":[     {"id":2,"name":"my benefit 2"},     {"id":6,"name":"my benefit 6"},     {"id":7,"name":"my benefit 7"}   ],   "categories":[     {"id":2,"name":"category 2"}   ] } PUT /index-3/products/5 {   "name":"product name 5",   "family_id":16,   "collection_id":6,   "created_at":"2015-04-13T12:49:42.000Z",   "updated_at":"2015-04-13T12:49:42.000Z",   "benefits":[     {"id":5,"name":"my benefit 2"},     {"id":6,"name":"my benefit 6"},     {"id":7,"name":"my benefit 7"}   ],   "categories":[     {"id":3,"name":"category 2"}   ] } PUT /index-3/products/6 {   "name":"product name 6",   "family_id":15,   "collection_id":5,   "created_at":"2015-04-13T12:49:42.000Z",   "updated_at":"2015-04-13T12:49:42.000Z",   "benefits":[     {"id":5,"name":"my benefit 2"},     {"id":55,"name":"my benefit 6"},     {"id":7,"name":"my benefit 7"}   ],   "categories":[     {"id":3,"name":"category 2"}   ] } 

And now the query part:

GET index-3/products/_search {   "query": {     "filtered": {       "query": {         "match_all": {}       },       "filter": {         "terms": {           "benefits.id": [             5,6,7           ],           "execution": "and"         }       }     }   } } 

Which produces the following result:

{    "took": 1,    "timed_out": false,    "_shards": {       "total": 1,       "successful": 1,       "failed": 0    },    "hits": {       "total": 1,       "max_score": 1,       "hits": [          {             "_index": "index-3",             "_type": "products",             "_id": "5",             "_score": 1,             "_source": {                "name": "product name 5",                "family_id": 16,                "collection_id": 6,                "created_at": "2015-04-13T12:49:42.000Z",                "updated_at": "2015-04-13T12:49:42.000Z",                "benefits": [                   {                      "id": 5,                      "name": "my benefit 2"                   },                   {                      "id": 6,                      "name": "my benefit 6"                   },                   {                      "id": 7,                      "name": "my benefit 7"                   }                ],                "categories": [                   {                      "id": 3,                      "name": "category 2"                   }                ]             }          }       ]    } } 

At the time of query we have to use terms filter with "and execution" so it will retrieve only the documents with all the terms.

like image 51
monu Avatar answered Nov 23 '22 15:11

monu