Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Slick generate a subquery when take() method is called

Tags:

sql

scala

slick

I use Slick 1.0.0-RC1. I have this definition for table object:

object ProductTable extends Table[(Int, String, String, String, Double, java.sql.Date, Int, Option[Int], Int, Boolean)]("products") {
  def id = column[Int]("productId", O.PrimaryKey, O.AutoInc)
  def title = column[String]("title")
  def description = column[String]("description")
  def shortDescription = column[String]("shortDescription")
  def price = column[Double]("price")
  def addedDate = column[java.sql.Date]("addedDate")
  def brandId = column[Int]("brandId")
  def defaultImageId = column[Option[Int]]("defaultImageId")
  def visitCounter = column[Int]("visitCounter")
  def archived = column[Boolean]("archived")
  def * = id ~ title ~ description ~ shortDescription ~ price ~ addedDate ~ brandId ~ defaultImageId ~ visitCounter ~ archived
}

I need a simple query which selects 8 rows from database:

ProductTable.filter(_.title === "something")
  .sortBy(_.visitCounter)
  .map(_.title)
  .take(8)
  .selectStatement

And the output is:

select x2.x3 from 
   (select x4.`title` as x3 from `products` x4 
     where x4.`title` = 'something' 
     order by x4.`visitCounter` limit 8) x2

If I get rid of take() method:

ProductTable.filter(_.title === "something")
 .sortBy(_.visitCounter)
 .map(_.title)
 .selectStatement

then the output is:

select x2.`title` from `products` x2 
where x2.`title` = 'something' 
order by x2.`visitCounter`

So my question is: Why does Slick generate a subquery when its query object is constructed with take() method?

P.S. If it can be related, I use MySql driver with all of these

like image 667
wassertim Avatar asked Jan 20 '13 15:01

wassertim


1 Answers

There's a short answer and a long one. The short one is: The subquery is there because so far nobody bothered to remove it.

The longer answer is related to that fact that swapping "map" and "take" does not make any difference, which is due to the compilation of queries being all but simple and straight-forward.

Relatively early in the query compiler there's a "forceOuterBinds" phase which introduces (potentially) lots of extra Bind (a.k.a. flatMap) operations that are semantically equivalent and redundant. The idea is to take some x which has a collection type and turn it into Bind(s, x, Pure(s)) where s is a fresh Symbol. If x is already of the shape Pure(y), we turn it into Bind(s, Pure(ProductNode()), Pure(y)) instead. In Scala code, think of this as turning an x: List[T] into x.flatMap(s => List(s)) or a List(y) into List(()).flatMap(s => List(y)). The purpose of this transformation is to make tree rewriting in the later compiler phases easier by giving us a place to modify the projection (which was created as an identity mapping) in all the places where we might want to do that.

Later on, in "convertToComprehensions", all nodes from the monadic form (Bind, Pure, Filter, Take, Drop, etc.) are converted individually to Comprehension nodes (representing a SQL select statement). The result is still not legal SQL though: SQL comprehensions are not monad comprehensions. They have very limiting scope rules that do not allow a from clause to reference a variable introduced by a previous from clause (in the same comprehension or an enclosing comprehension).

That's why we need the next phase, "fuseComprehensions", which may look purely like an optimization at first glance but is actually required to generate correct code. This phase tries to fuse the individual comprehensions as much as possible to avoid these illegal references. We've made some progress with what we can fuse but a 100% solution to the scope problem is not in sight (and in fact, I am pretty sure that it is impossible to solve).

To reiterate that, progress in this phase was mostly driven by the need for correctness, not just generating nicer code. So could we remove that extra subquery? Yes, certainly, but nobody has implemented it yet.

If you want to try to implement such an optimization, here are some caveats to think about:

  • Are you content with removing purely aliasing projections (i.e. a Comprehension where the select slot has the form Some(ProductNode(ch)) where each element of ch is a Path)?
  • Or maybe you think that select x+1 from (... limit ...)) should also be fused. What kinds of expressions can you allow? E.g. would a RowNum be OK?
  • What kind of shape does the subquery need to have? For example, may it contain groupBy or orderBy clauses?

(And something for me to think about: How long would it take to implement that optimization compared to explaining why it doesn't yet exist?)

like image 170
szeiger Avatar answered Nov 13 '22 07:11

szeiger