I'm indexing a data set for elasticsearch using Tire and ActiveRecord. I have an Artist model, which has_many :images. How can I index a method of the Artist model which returns a specific image? Or alternatively reference a method of the associated model? My desired Artist result will include the paths for the primary Image associated with the Artist (both the original and the thumbnail).
I've tried this mapping:
mapping do
indexes :id, :index => :not_analyzed
indexes :name
indexes :url
indexes :primary_image_original
indexes :primary_image_thumbnail
end
to reference these Artist methods:
def primary_image_original
return images.where(:priority => 'primary').first.original
end
def primary_image_thumbnail
return images.where(:priority => 'primary').first.thumbnail_150
end
This just ignores the indexed methods. Based on other answers like Elasticsearch, Tire, and Nested queries / associations with ActiveRecord, I tried this:
mapping do
indexes :id, :index => :not_analyzed
indexes :name
indexes :url
indexes :images do
indexes :original
indexes :thumbnail_150
indexes :priority
end
end
def to_indexed_json
to_json(include: { images: { only: [:original, :thumbnail_150, :priority] } } )
end
But this also doesn't return what I'm after. I've spent several hours googling and reading the elasticsearch and Tire documentation and haven't found a working example of this pattern to follow. Thanks for your ideas!
So, to include your solution to the indexing problem here.
One way to index a method is to include it in the to_json
call:
def to_indexed_json
to_json(
:only => [ :id, :name, :normalized_name, :url ],
:methods => [ :primary_image_original, :primary_image_thumbnail, :account_balance ]
)
end
Another one, and more preferable, is to use the :as
option in the mapping block:
mapping do
indexes :id, :index => :not_analyzed
indexes :name
# ...
# Relationships
indexes :primary_image_original, :as => 'primary_image_original'
indexes :account_balance, :as => 'account_balance'
end
The problem with slow indexing is most probably due to n+1 queries in the database: for every artist you index, you issue a query for images (original and thumbnail). A much more performant way would be to join the associated records in one query; see Eager Loading Associations in Rails Guides.
The Tire Index#import
method,
and the import Rake task, allow you to pass parameters which are then sent to the paginate method down the wire.
So let's compare the naive approach:
bundle exec rake environment tire:import CLASS=Article FORCE=true
Article Load (7.6ms) SELECT "articles".* FROM "articles" LIMIT 1000 OFFSET 0
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 1)
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 2)
...
Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE ("comments".article_id = 100)
And when we pass the include
fragment:
bundle exec rake environment tire:import PARAMS='{:include => ["comments"]}' CLASS=Article FORCE=true
Article Load (8.7ms) SELECT "articles".* FROM "articles" LIMIT 1000 OFFSET 0
Comment Load (31.5ms) SELECT "comments".* FROM "comments" WHERE ("comments".article_id IN (1,2, ... ,100))
Much better :) Please try it out and let me know if it solves your issue.
You can also try it out in the Rails console: Article.import
vs. Article.import(include: ['comments'])
. As a side note, this exact problem was the reason for supporting the params
hash in the whole importing toolchain in Tire.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With