example:
Foobar.joins(:baz).includes(:baz).count
=> 22926
Foobar.joins(:baz).includes(:baz).find_each.count
=> 998
Foobar.joins(:baz).find_each.count
=> 22926
The generated sql in the correct case (third) is several batches of sql that looks like:
SELECT "foobar".* FROM "foobar" INNER JOIN "baz" ON
"baz"."foobar_id" = "foobar"."id" ORDER BY "foobar"."id" ASC LIMIT $1
in the failing (second) case there is a single query that looks like:
SELECT "foobar"."id" AS t0_r0
"baz"."id" AS t1_r0
"baz"."foobar_id" AS t1_r1
FROM "foobar" INNER JOIN "baz" ON "baz"."foobar_id" = "foobar"."id"
ORDER BY "foobar"."id" ASC LIMIT $1
where all of the fields are listed as a different temporary variable (e.g. t0_r0
) for the different columns on each table (in the actual query there are 37 split 30 on the first object, 7 on the second).
Is this a bug? are includes
not allowed in a find_each
query? Am I doing something wrong?
The relationship between Foobar
and Baz
is Foobar
has_one
Baz
and Baz
belongs_to
Foobar
.
This problem can happen if your has_one
relationship isn't really has_one
.
Say that your database doesn't have a unique index on the column baz.foobar_id
. Then you could accidentally end up with a situation like this where you have a Foobar record that is connected to more than one Baz record:
baz.id | baz.foobar_id
------ -------------
1 1
2 1
3 2
In this situation, the joins
will return the combination of Foobar and Baz records:
Foobar.joins(:baz).count # This would be 3
This also means that find_each
with join
would iterate 3 times and repeat one of the Foobar IDs:
Foobar.joins(:baz).find_each(batch_size: 2) { |f| puts f.id }
# SELECT "foobar".* FROM "foobar" INNER JOIN "baz" ON... LIMIT 2
1
1
# SELECT "foobar".* FROM "foobar" INNER JOIN "baz" ON... WHERE ("foobar"."id" > 1) ... LIMIT 2
2
Adding in includes
means that Rails is going to try to consolidate the results back into a set of distinct Foobar records. But this won't work with how find_each
manages its batches:
Foobar.joins(:baz).includes(:baz).find_each(batch_size: 2) { |f| puts f.id }
# SELECT "foobar"."id" AS t0_r0 ... LIMIT 2
1
And at this point find_each
will stop processing because it has found an early batch is smaller than the batch size, so it thinks it is done:
# ActiveRecord::Batches#in_batches
break if ids.length < batch_limit
The default batch size for find_each
is 1,000. Your problem case returned 998 records. That indicates that the first batch it loaded 998 unique Foobar IDs which is less than the batch size, and find_each
thought it was done. It probably loaded up 1,000 Baz records which were connected to 998 distinct Foobar records.
You may want to review your baz
table to see if it has any duplicate entries. You can do this with something like:
Baz.group(:foobar_id).having('count(*) > 1')
The best solution would be to use a unique index to avoid duplicates in the database and to enforce the has_one
relationship. An alternative would be to ensure you're getting a distinct set of Foobar records with something like:
Foobar.group(:id).joins(:baz).includes(:baz).count
Foobar.group(:id).joins(:baz).includes(:baz).find_each.count
Foobar.group(:id).joins(:baz).find_each.count
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With