Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails ignores association scope when pre-caching with includes

TL;DR: This question has its own sample app at https://github.com/skensell/SO-question-example which you can use to debug yourself. I put a bounty already once on this question, but I'm not convinced of (or I don't understand) the top answerer's reasoning. I am going to put another bounty on this because it's causing me a lot of frustration.

ORIGINAL QUESTION

I have a model User which has an association like so:

has_many :avatars, -> { order([:sort_order => :asc,:created_at => :asc])}

I have an endpoint which performs a search for users and sets an @users variable to be used by the view. Here comes the spooky part which I found in the debugger:

@users.first.avatars[0..2].map(&:id)
# => [2546, 2547, 2548]
# This is the correct order.

@users.to_a.first.avatars[0..2].map(&:id)
# => [2548, 2546, 2547]
# Wrong order.

What is going on here?

The only difference is to_a. I even tried leaving out the to_a, but I think it's implicitly being called by jbuilder anyways since I set it to a json array.

Maybe the way I'm searching User has something to do with it? I'm using several includes and joins.

UPDATE

Here I can show you a simple example of this weird behavior from the rails console. It seems that the includes..references is the offender, but I don't get why or how.

User.order(id: :desc)
    .includes(:avatars, :industries)
    .where(industries: {id:  [5]})
    .references(:industries)
    .limit(5).to_a.second.avatars.map(&:id)
# => [2751, 2748, 2749]
# Wrong order.

User.order(id: :desc)
    .includes(:avatars, :industries)
    .where(industries: {id:  [5]})
    .references(:industries)
    .limit(5).second.avatars.map(&:id)
# => [2748, 2749, 2751]
# Correct order.

I can verify that these queries refer to the same user, and that the one labeled Correct order is really correct w.r.t sort_order and created_at (which is how the association specifies the ordering).

UPDATE 2

Attached is the requested SQL log. I changed irrelevant fields to "OMITTED" and I replaced 34 irrelevant user fields with a '...'.

>> User.order(id: :desc).includes(:avatars, :industries).where(industries: {id:  [5]}).references(:industries).limit(5).to_a.second.avatars.map(&:id)
SQL (18.5ms)  SELECT  DISTINCT "users"."id", "users"."id" AS alias_0 FROM "users" LEFT OUTER JOIN "avatars" ON "avatars"."user_id" = "users"."id" LEFT OUTER JOIN "user_professions" ON "user_professions"."user_id" = "users"."id" LEFT OUTER JOIN "industries" ON "industries"."id" = "user_professions"."industry_id" WHERE "industries"."id" IN (5)  ORDER BY "users"."id" DESC LIMIT 5
SQL (8.3ms)  SELECT "users"."id" AS t0_r0, "users"."OMITTED" AS t0_r1, "users"."OMITTED" AS t0_r2, ... AS t0_r36, "avatars"."id" AS t1_r0, "avatars"."user_id" AS t1_r1, "avatars"."avatar" AS t1_r2, "avatars"."created_at" AS t1_r3, "avatars"."updated_at" AS t1_r4, "avatars"."OMITTED" AS t1_r5, "avatars"."OMITTED" AS t1_r6, "avatars"."sort_order" AS t1_r7, "industries"."id" AS t2_r0, "industries"."name" AS t2_r1, "industries"."created_at" AS t2_r2, "industries"."updated_at" AS t2_r3 FROM "users" LEFT OUTER JOIN "avatars" ON "avatars"."user_id" = "users"."id" LEFT OUTER JOIN "user_professions" ON "user_professions"."user_id" = "users"."id" LEFT OUTER JOIN "industries" ON "industries"."id" = "user_professions"."industry_id" WHERE "industries"."id" IN (5) AND "users"."id" IN (1526, 945, 927, 888, 884)  ORDER BY "users"."id" DESC
=> [2751, 2748, 2749]

>> User.order(id: :desc).includes(:avatars, :industries).where(industries: {id:  [5]}).references(:industries).limit(5).second.avatars.map(&:id)
SQL (0.9ms)  SELECT  DISTINCT "users"."id", "users"."id" AS alias_0 FROM "users" LEFT OUTER JOIN "avatars" ON "avatars"."user_id" = "users"."id" LEFT OUTER JOIN "user_professions" ON "user_professions"."user_id" = "users"."id" LEFT OUTER JOIN "industries" ON "industries"."id" = "user_professions"."industry_id" WHERE "industries"."id" IN (5)  ORDER BY "users"."id" DESC LIMIT 1 OFFSET 1
SQL (0.8ms)  SELECT "users"."id" AS t0_r0, "users"."OMITTED" AS t0_r1, "users"."OMITTED" AS t0_r2, ... AS t0_r36, "avatars"."id" AS t1_r0, "avatars"."user_id" AS t1_r1, "avatars"."avatar" AS t1_r2, "avatars"."created_at" AS t1_r3, "avatars"."updated_at" AS t1_r4, "avatars"."OMITTED" AS t1_r5, "avatars"."OMITTED" AS t1_r6, "avatars"."sort_order" AS t1_r7, "industries"."id" AS t2_r0, "industries"."name" AS t2_r1, "industries"."created_at" AS t2_r2, "industries"."updated_at" AS t2_r3 FROM "users" LEFT OUTER JOIN "avatars" ON "avatars"."user_id" = "users"."id" LEFT OUTER JOIN "user_professions" ON "user_professions"."user_id" = "users"."id" LEFT OUTER JOIN "industries" ON "industries"."id" = "user_professions"."industry_id" WHERE "industries"."id" IN (5) AND "users"."id" IN (945)  ORDER BY "users"."id" DESC
=> [2748, 2749, 2751]
>>

And here I'll attach a log which shows the user in question's avatars (id, sort_order, and created_at) so you can see that the order should be deterministic.

>> User.find(945).avatars.pluck(:id,:sort_order,:created_at)
User Load (5.3ms)  SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1  [["id", 945]]
(0.2ms)  SELECT "avatars"."id", "avatars"."sort_order", "avatars"."created_at" FROM "avatars"  WHERE "avatars"."user_id" = $1  ORDER BY "avatars"."sort_order" ASC, "avatars"."created_at" ASC  [["user_id", 945]]
=> [[2748, 0, Fri, 13 Nov 2015 00:32:53 UTC +00:00], [2749, 0, Fri, 13 Nov 2015 00:47:02 UTC +00:00], [2751, 0, Fri, 13 Nov 2015 00:48:05 UTC +00:00]]

Also, I'm using Rails 4.1.4 and Ruby 2.1.10.

UPDATE 3

I have created a sample app here: https://github.com/skensell/SO-question-example . What's even stranger in this sample app is that the to_a doesn't even matter. I get the wrong ordering even with just the includes... references.

like image 368
skensell Avatar asked Sep 06 '16 23:09

skensell


2 Answers

Includes considers the order of the parent table whose records are going to be retrieved only when it results in a join query. ie in the above case, the avatars order is going to be skipped and the users order is going to be used when the includes query results in a join. You can add a default scope to User and confirm.

If you still want the user.avatars to be sorted as per the defined avatar orders, you need to replace includes with joins. Note that using join will retrieve duplicate user records.

A working solution to retrieve data as expected is to use joins and includes together.

Loading development environment (Rails 4.1.4)
2.2.0 :001 > User.count
(0.1ms)  SELECT COUNT(*) FROM "users"
=> 2 
2.2.0 :002 > User.pluck :id, :name
(0.2ms)  SELECT "users"."id", "users"."name" FROM "users"
=> [[1, "John"], [2, "Jill"]] 
2.2.0 :003 > User.first.industries.pluck :id, :name
User Load (0.2ms)  SELECT  "users".* FROM "users"   ORDER BY "users"."id" ASC LIMIT 1
(0.2ms)  SELECT "industries"."id", "industries"."name" FROM "industries" INNER JOIN "user_industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "user_industries"."user_id" = ?  [["user_id", 1]]
=> [[1, "Art"], [2, "Music"]] 
2.2.0 :004 > User.last.industries.pluck :id, :name
User Load (1.4ms)  SELECT  "users".* FROM "users"   ORDER BY "users"."id" DESC LIMIT 1
(0.2ms)  SELECT "industries"."id", "industries"."name" FROM "industries" INNER JOIN "user_industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "user_industries"."user_id" = ?  [["user_id", 2]]
=> [[1, "Art"]] 
2.2.0 :005 > User.first.avatars.pluck :id, :sort_order
User Load (0.4ms)  SELECT  "users".* FROM "users"   ORDER BY "users"."id" ASC LIMIT 1
(0.3ms)  SELECT "avatars"."id", "avatars"."sort_order" FROM "avatars"  WHERE "avatars"."user_id" = ?  ORDER BY "avatars"."sort_order" ASC, "avatars"."created_at" ASC  [["user_id", 1]]
=> [[1, 0], [3, 1], [2, 2]] 
2.2.0 :006 > User.last.avatars.pluck :id, :sort_order
User Load (4.1ms)  SELECT  "users".* FROM "users"   ORDER BY "users"."id" DESC LIMIT 1
(0.2ms)  SELECT "avatars"."id", "avatars"."sort_order" FROM "avatars"  WHERE "avatars"."user_id" = ?  ORDER BY "avatars"."sort_order" ASC, "avatars"."created_at" ASC  [["user_id", 2]]
=> [[4, 5], [6, 6], [5, 7]] 
2.2.0 :007 > ap User.joins(:avatars, :industries).where(industries: {id: [1]}).references(:industries).count
(0.2ms)  SELECT COUNT(*) FROM "users" INNER JOIN "avatars" ON "avatars"."user_id" = "users"."id" INNER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" INNER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
6
=> nil 
2.2.0 :008 > ap User.joins(:avatars, :industries).where(industries: {id: [1]}).references(:industries).uniq.count
(0.3ms)  SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "avatars" ON "avatars"."user_id" = "users"."id" INNER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" INNER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
2
=> nil 
2.2.0 :009 > ap User.joins(:industries).where(industries: {id: [1]}).references(:industries).count
(0.3ms)  SELECT COUNT(*) FROM "users" INNER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" INNER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
2
=> nil 
 2.2.0 :010 > User.joins(:industries).where(industries: {id: [1]}).references(:industries).each{|user| ap user.avatars }
User Load (0.3ms)  SELECT "users".* FROM "users" INNER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" INNER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
Avatar Load (0.2ms)  SELECT "avatars".* FROM "avatars"  WHERE "avatars"."user_id" = ?  ORDER BY "avatars"."sort_order" ASC, "avatars"."created_at" ASC  [["user_id", 1]]
[
  [0] #<Avatar:0x007ff03f8ab448> {
            :id => 1,
       :user_id => 1,
    :sort_order => 0,
    :created_at => Tue, 04 Oct 2016 07:05:36 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:44 UTC +00:00
},
[1] #<Avatar:0x007ff03ec7e4e0> {
            :id => 3,
       :user_id => 1,
    :sort_order => 1,
    :created_at => Tue, 04 Oct 2016 07:05:40 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:40 UTC +00:00
},
[2] #<Avatar:0x007ff03ec7e2d8> {
            :id => 2,
       :user_id => 1,
    :sort_order => 2,
    :created_at => Tue, 04 Oct 2016 07:05:38 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:42 UTC +00:00
}
]
Avatar Load (0.2ms)  SELECT "avatars".* FROM "avatars"  WHERE "avatars"."user_id" = ?  ORDER BY "avatars"."sort_order" ASC, "avatars"."created_at" ASC  [["user_id", 2]]
[
  [0] #<Avatar:0x007ff03f9121e8> {
            :id => 4,
       :user_id => 2,
    :sort_order => 5,
    :created_at => Tue, 04 Oct 2016 07:05:44 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00
},
[1] #<Avatar:0x007ff03f911fe0> {
            :id => 6,
       :user_id => 2,
    :sort_order => 6,
    :created_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00
},
[2] #<Avatar:0x007ff03f911dd8> {
            :id => 5,
       :user_id => 2,
    :sort_order => 7,
    :created_at => Tue, 04 Oct 2016 07:05:46 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00
}
]
=> [#<User id: 1, name: "John", created_at: "2016-10-04 07:05:40", updated_at: "2016-10-04 07:05:40">, #<User id: 2, name: "Jill", created_at: "2016-10-04 07:05:48", updated_at: "2016-10-04 07:05:48">] 
2.2.0 :011 > User.joins(:industries).where(industries: {id: [1]}).references(:industries).includes(:avatars).each{|user| ap user.avatars }

User Load (0.3ms)  SELECT "users".* FROM "users" INNER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" INNER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
Avatar Load (0.2ms)  SELECT "avatars".* FROM "avatars"  WHERE "avatars"."user_id" IN (1, 2)  ORDER BY "avatars"."sort_order" ASC, "avatars"."created_at" ASC
[
  [0] #<Avatar:0x007ff03c7f0df8> {
            :id => 1,
       :user_id => 1,
    :sort_order => 0,
    :created_at => Tue, 04 Oct 2016 07:05:36 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:44 UTC +00:00
},
[1] #<Avatar:0x007ff03c7f0bf0> {
            :id => 3,
       :user_id => 1,
    :sort_order => 1,
    :created_at => Tue, 04 Oct 2016 07:05:40 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:40 UTC +00:00
},
[2] #<Avatar:0x007ff03c7f09c0> {
            :id => 2,
       :user_id => 1,
    :sort_order => 2,
    :created_at => Tue, 04 Oct 2016 07:05:38 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:42 UTC +00:00
}
]
[
[0] #<Avatar:0x007ff03c7f07b8> {
            :id => 4,
       :user_id => 2,
    :sort_order => 5,
    :created_at => Tue, 04 Oct 2016 07:05:44 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00
},
[1] #<Avatar:0x007ff03c7f0588> {
            :id => 6,
       :user_id => 2,
    :sort_order => 6,
    :created_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00
},
[2] #<Avatar:0x007ff03c7f0380> {
            :id => 5,
       :user_id => 2,
    :sort_order => 7,
    :created_at => Tue, 04 Oct 2016 07:05:46 UTC +00:00,
    :updated_at => Tue, 04 Oct 2016 07:05:48 UTC +00:00
}
]
=> [#<User id: 1, name: "John", created_at: "2016-10-04 07:05:40", updated_at: "2016-10-04 07:05:40">, #<User id: 2, name: "Jill", created_at: "2016-10-04 07:05:48", updated_at: "2016-10-04 07:05:48">] 

Basically, we have 2 eager loading functions: preload and eager_load. When you use includes, it either calls preload or eager_load. preload results in 2 queries(find users & find avatars for retrieved users) wheres eager_load uses only 1 query(the join query). So, when includes results in an join query(ie, results in eager_load), the order of the associations to be retrieved is skipped since its a single query.

User.includes(:avatars, :industries).where(industries: {id: [1]}).references(:industries)

results in join cause you are filtering the users based on particular industries which in itself is a 'through' association. 'through' uses a join. Also, remember that 'joins' results in an INNER JOIN whereas eager_load uses LEFT OUTER JOIN.

2.2.0 :050 > User.joins(:industries).where(industries: {id: [1]}).references(:industries)
User Load (0.2ms)  SELECT "users".* FROM "users" INNER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" INNER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
=> #<ActiveRecord::Relation [#<User id: 1, name: "John", created_at: "2016-10-04 07:05:40", updated_at: "2016-10-04 07:05:40">, #<User id: 2, name: "Jill", created_at: "2016-10-04 07:05:48", updated_at: "2016-10-04 

2.2.0 :054 >   User.includes(:industries).where(industries: {id: [1]}).references(:industries)
SQL (0.3ms)  SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "industries"."id" AS t1_r0, "industries"."name" AS t1_r1, "industries"."created_at" AS t1_r2, "industries"."updated_at" AS t1_r3 FROM "users" LEFT OUTER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" LEFT OUTER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)
=> #<ActiveRecord::Relation [#<User id: 1, name: "John", created_at: "2016-10-04 07:05:40", updated_at: "2016-10-04 07:05:40">, #<User id: 2, name: "Jill", created_at: "2016-10-04 07:05:48", updated_at: "2016-10-04 07:05:48">]> 

2.2.0 :057 >   User.eager_load(:industries).where(industries: {id: [1]}).references(:industries)
SQL (0.3ms)  SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "industries"."id" AS t1_r0, "industries"."name" AS t1_r1, "industries"."created_at" AS t1_r2, "industries"."updated_at" AS t1_r3 FROM "users" LEFT OUTER JOIN "user_industries" ON "user_industries"."user_id" = "users"."id" LEFT OUTER JOIN "industries" ON "industries"."id" = "user_industries"."industry_id" WHERE "industries"."id" IN (1)

=> #<ActiveRecord::Relation [#<User id: 1, name: "John", created_at: "2016-10-04 07:05:40", updated_at: "2016-10-04 07:05:40">, #<User id: 2, name: "Jill", created_at: "2016-10-04 07:05:48", updated_at: "2016-10-04 07:05:48">]>   

You can refer http://blog.arkency.com/2013/12/rails4-preloading/ for more examples with explanation. I havent found as to why the types of joins are different. Anyways, I hope this helps. I will try to reproduce the same for later versions of rails.

like image 73
prasad.surase Avatar answered Oct 11 '22 17:10

prasad.surase


@users.first.avatars[0..2].map(&:id)
# => [2546, 2547, 2548]

@users.to_a.first.avatars[0..2].map(&:id)
# => [2548, 2546, 2547]
  • What is going on here?

There is nothing wrong.

As per the Guide Retrieving a Single Object Section: 1.1.3 #first

The first method finds the first record ordered by primary key (default)

SELECT * FROM clients ORDER BY clients.id ASC LIMIT 1

And the next methods which return as a collection @users.to_a order by updated_at

Hope this help you!

like image 23
Gupta Avatar answered Oct 11 '22 15:10

Gupta