Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index the results of a method in ElasticSearch (Tire + ActiveRecord)

I'm indexing a data set for elasticsearch using Tire and ActiveRecord. I have an Artist model, which has_many :images. How can I index a method of the Artist model which returns a specific image? Or alternatively reference a method of the associated model? My desired Artist result will include the paths for the primary Image associated with the Artist (both the original and the thumbnail).

I've tried this mapping:

mapping do
  indexes :id,                  :index    => :not_analyzed
  indexes :name                     
  indexes :url
  indexes :primary_image_original       
  indexes :primary_image_thumbnail
end

to reference these Artist methods:

    def primary_image_original  
        return images.where(:priority => 'primary').first.original
    end

    def primary_image_thumbnail
        return images.where(:priority => 'primary').first.thumbnail_150
    end

This just ignores the indexed methods. Based on other answers like Elasticsearch, Tire, and Nested queries / associations with ActiveRecord, I tried this:

mapping do
  indexes :id,                  :index    => :not_analyzed
  indexes :name 
  indexes :url
  indexes :images do
    indexes :original
    indexes :thumbnail_150
    indexes :priority
  end
end

def to_indexed_json
    to_json(include: { images: { only: [:original, :thumbnail_150, :priority] } } )
end

But this also doesn't return what I'm after. I've spent several hours googling and reading the elasticsearch and Tire documentation and haven't found a working example of this pattern to follow. Thanks for your ideas!

like image 507
johnny.rodgers Avatar asked Nov 28 '12 07:11

johnny.rodgers


1 Answers

So, to include your solution to the indexing problem here.

Indexing associations

One way to index a method is to include it in the to_json call:

def to_indexed_json
  to_json( 
    :only   => [ :id, :name, :normalized_name, :url ],
    :methods   => [ :primary_image_original, :primary_image_thumbnail, :account_balance ]
  )
end

Another one, and more preferable, is to use the :as option in the mapping block:

mapping do
  indexes :id, :index    => :not_analyzed
  indexes :name             
  # ...

  # Relationships
  indexes :primary_image_original, :as => 'primary_image_original'
  indexes :account_balance,        :as => 'account_balance'
end

Fighting n+1 queries when importing

The problem with slow indexing is most probably due to n+1 queries in the database: for every artist you index, you issue a query for images (original and thumbnail). A much more performant way would be to join the associated records in one query; see Eager Loading Associations in Rails Guides.

The Tire Index#import method, and the import Rake task, allow you to pass parameters which are then sent to the paginate method down the wire.

So let's compare the naive approach:

bundle exec rake environment tire:import CLASS=Article FORCE=true
Article Load (7.6ms)  SELECT "articles".* FROM "articles" LIMIT 1000 OFFSET 0
Comment Load (0.2ms)  SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 1)
Comment Load (0.1ms)  SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 2)
...
Comment Load (0.3ms)  SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 100)

And when we pass the include fragment:

bundle exec rake environment tire:import PARAMS='{:include => ["comments"]}'  CLASS=Article FORCE=true 
Article Load (8.7ms)  SELECT "articles".* FROM "articles" LIMIT 1000 OFFSET 0
Comment Load (31.5ms) SELECT "comments".* FROM "comments" WHERE ("comments".article_id IN (1,2, ... ,100))

Much better :) Please try it out and let me know if it solves your issue.


You can also try it out in the Rails console: Article.import vs. Article.import(include: ['comments']). As a side note, this exact problem was the reason for supporting the params hash in the whole importing toolchain in Tire.

like image 67
karmi Avatar answered Oct 04 '22 17:10

karmi