Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing .references requirement on includes vs. eager_load

I know that when you utilize includes and you specify a where clause on the joined table, you should use .references

example:

# will error out or throw deprecation warning in logs
users = User.includes(:orders).where("Orders.cost < ?", 20)

In rails 4 or later, you will get an error like the following:

Mysql2::Error: Unknown column 'Orders.cost' in 'where clause': SELECT customers.* FROM customers WHERE (Orders.cost < 100)

Or you will get a deprecation warning:

DEPRECATION WARNING: It looks like you are eager loading table(s) (one of: users, addresses) that are referenced in a string SQL snippet. For example:

Post.includes(:comments).where("comments.title = 'foo'") Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality. From now on, you must explicitly tell Active Record when you are referencing a table from a string:

Post.includes(:comments).where("comments.title = 'foo'").references(:comments)

If you don't rely on implicit join references you can disable the feature entirely by setting config.active_record.disable_implicit_join_references = true. (

SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id" WHERE (addresses.country = 'Poland')

so we do this:

# added .references(:orders)
users = User.includes(:orders).where("Orders.cost < ?", 20).references(:orders)

And it executes just fine:

SELECT "users"."id"     AS t0_r0, 
  "users"."name"        AS t0_r1, 
  "users"."created_at"  AS t0_r2, 
  "users"."updated_at"  AS t0_r3, 
  "orders"."id"         AS t1_r0, 
  "orders"."cost"       AS t1_r1, 
  "orders"."user_id"    AS t1_r2, 
  "orders"."created_at" AS t1_r3, 
  "orders"."updated_at" AS t1_r4 
FROM "users" 
LEFT OUTER JOIN "orders" 
ON "orders"."user_id" = "users"."id" 
WHERE  ( orders.cost < 20 ) 

I know that .includes is just a wrapper for two methods: eager_load and preload. I know that since my query above is doing a filter on a joined table (orders in this example), includes is smart and knows to pick the eager_load implementation over preload because preload cannot handle doing this query since preload does not join tables.

Here is where I am confused. Ok: So on that query above: under the hood includes will utilize the eager_load implementation. But notice how when I explicitly use eager_load for this same query (which is what includes is essentially doing): I do not need to use .references! It runs the query and loads the data just fine. No error and no deprecation warning:

# did not specify .references(:orders), and yet no error and no deprecation warning 
users = User.eager_load(:orders).where("Orders.cost < ?", 20)

And it executes the exact same process with no problem:

SELECT "users"."id"     AS t0_r0, 
  "users"."name"        AS t0_r1, 
  "users"."created_at"  AS t0_r2, 
  "users"."updated_at"  AS t0_r3, 
  "orders"."id"         AS t1_r0, 
  "orders"."cost"       AS t1_r1, 
  "orders"."user_id"    AS t1_r2, 
  "orders"."created_at" AS t1_r3, 
  "orders"."updated_at" AS t1_r4 
FROM "users" 
LEFT OUTER JOIN "orders" 
ON "orders"."user_id" = "users"."id" 
WHERE  ( orders.cost < 20 ) 

That seems odd. Why does .references need to be specified for the includes version of the query, whereas .references does not need to be specified for the eager_load version of the query? What am I missing here?

like image 856
Neil Avatar asked Jan 06 '16 16:01

Neil


People also ask

What is the difference between includes and joins?

What is the difference between includes and joins? The most important concept to understand when using includes and joins is they both have their optimal use cases. Includes uses eager loading whereas joins uses lazy loading, both of which are powerful but can easily be abused to reduce or overkill performance.

What is include in Rails?

Rails provides an ActiveRecord method called :includes which loads associated records in advance and limits the number of SQL queries made to the database. This technique is known as "eager loading" and in many cases will improve performance by a significant amount.

What is N 1 query problem rails?

The n+1 query problem is one of the most common scalability bottlenecks. It involves fetching a list of resources from a database that includes other associated resources within them. This means that we might have to query for the associated resources separately.

What is eager loading in Rails?

Eager loading is a way to find objects of a certain class and a number of named associations. Here I share my thoughts on using it with Rails. What are N + 1 queries? It mainly occurs when you load the bunch of objects and then for each object you make one more query to find associated object.


1 Answers

It comes down to the problem they mention in the deprecation warning:

Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality.

In older versions, Rails tried to be helpful about selecting the query pattern to use, and includes would use the preload strategy if it could, but switch to the eager_load strategy when it looks like you're referencing something in a joined table. But without a full SQL parser figuring out what tables are actually referenced, it's like parsing XHTML with a Regex - you can get some things done, but Rails can't decide correctly in every case. Consider:

User.includes(:orders).where("Orders.cost < 20")

This is a nice, simple example, and Rails could tell that you need Orders joined. Now try this one:

User.includes(:orders).where("id IN (select user_id from Orders where Orders.cost < 20)")

This gives the same result, but the subquery rendered joining Orders unnecessary. It's a contrived example, and I don't know whether Rails would decide the second query needed to join or not, but the point is there are cases when the heuristic could make the wrong decision. In those cases, either Rails would perform an unnecessary join, burning memory and slowing the query down, or not perform a necessary join, causing an error.

Rather than maintain a heuristic with a pretty bad failure case, the developers decided to just ask the programmer whether the join is needed. You're able to get it right more often than Rails can (hopefully), and when you get it wrong, it's clear what to change.

Instead of adding references you could switch to eager_load, but keeping includes and references separate allows the implementation flexibility in its query pattern. You could conceivably .includes(:orders, :addresses).references(:orders) and have addresses loaded in a second preload-style query because it's not needed during the join (though Rails actually just includes addresses in the join anyway). You don't need to specify references when you're using eager_load because eager_load always joins, where preload always does multiple queries. All references does is instruct includes to use the necessary eager_load strategy and specify which tables are needed.

like image 86
Kristján Avatar answered Oct 02 '22 12:10

Kristján