Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are you allowed to use a find_each query with an includes statement?

example:

Foobar.joins(:baz).includes(:baz).count
=> 22926
Foobar.joins(:baz).includes(:baz).find_each.count
=> 998
Foobar.joins(:baz).find_each.count
=> 22926

The generated sql in the correct case (third) is several batches of sql that looks like:

SELECT  "foobar".* FROM "foobar" INNER JOIN "baz" ON 
"baz"."foobar_id" = "foobar"."id" ORDER BY "foobar"."id" ASC LIMIT $1

in the failing (second) case there is a single query that looks like:

SELECT  "foobar"."id" AS t0_r0
 "baz"."id" AS t1_r0
 "baz"."foobar_id" AS t1_r1
 FROM "foobar" INNER JOIN "baz" ON "baz"."foobar_id" = "foobar"."id" 
 ORDER BY "foobar"."id" ASC LIMIT $1

where all of the fields are listed as a different temporary variable (e.g. t0_r0) for the different columns on each table (in the actual query there are 37 split 30 on the first object, 7 on the second).

Is this a bug? are includes not allowed in a find_each query? Am I doing something wrong?

The relationship between Foobar and Baz is Foobar has_one Baz and Baz belongs_to Foobar.

like image 243
Mike H-R Avatar asked Apr 07 '17 10:04

Mike H-R


1 Answers

This problem can happen if your has_one relationship isn't really has_one.

Say that your database doesn't have a unique index on the column baz.foobar_id. Then you could accidentally end up with a situation like this where you have a Foobar record that is connected to more than one Baz record:

baz.id | baz.foobar_id
------   -------------
1        1
2        1
3        2

In this situation, the joins will return the combination of Foobar and Baz records:

Foobar.joins(:baz).count  # This would be 3

This also means that find_each with join would iterate 3 times and repeat one of the Foobar IDs:

Foobar.joins(:baz).find_each(batch_size: 2) { |f| puts f.id }
# SELECT  "foobar".* FROM "foobar" INNER JOIN "baz" ON... LIMIT 2
1
1
# SELECT  "foobar".* FROM "foobar" INNER JOIN "baz" ON... WHERE ("foobar"."id" > 1) ... LIMIT 2
2

Adding in includes means that Rails is going to try to consolidate the results back into a set of distinct Foobar records. But this won't work with how find_each manages its batches:

Foobar.joins(:baz).includes(:baz).find_each(batch_size: 2) { |f| puts f.id }
# SELECT  "foobar"."id" AS t0_r0 ... LIMIT 2
1

And at this point find_each will stop processing because it has found an early batch is smaller than the batch size, so it thinks it is done:

# ActiveRecord::Batches#in_batches
break if ids.length < batch_limit

The default batch size for find_each is 1,000. Your problem case returned 998 records. That indicates that the first batch it loaded 998 unique Foobar IDs which is less than the batch size, and find_each thought it was done. It probably loaded up 1,000 Baz records which were connected to 998 distinct Foobar records.

You may want to review your baz table to see if it has any duplicate entries. You can do this with something like:

Baz.group(:foobar_id).having('count(*) > 1')

The best solution would be to use a unique index to avoid duplicates in the database and to enforce the has_one relationship. An alternative would be to ensure you're getting a distinct set of Foobar records with something like:

Foobar.group(:id).joins(:baz).includes(:baz).count
Foobar.group(:id).joins(:baz).includes(:baz).find_each.count
Foobar.group(:id).joins(:baz).find_each.count
like image 96
cschroed Avatar answered Sep 25 '22 01:09

cschroed