Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested Selects Or JOINS?

Tags:

sql

sql-server

This is a snippet from my stored Proc

SELECT  NULL AS StoryID
      , AlbumID
      , CAST(NULL as varchar) AS StoryTitle
      , AlbumName
      , (SELECT URL FROM AlbumPictures AS AlbumPictures_3 WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')) AS AlbumCover
      , Votes
      , CAST(NULL as Int) AS PictureId
      , 'albums' AS tableName
      , (SELECT NestedAlbums.AlbumID FROM NestedAlbums WHERE (AlbumID = Albums.AlbumID)) AS Flag
INTO #Results2
FROM Albums WHERE AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

I have used nested selects in my Query above. I am curious to know whether Nested Selects are better than LEFT/Right JOINS OR should i use JOINS?

Table Albums:

enter image description here

Table NestedAlbums:

enter image description here

like image 991
user1593175 Avatar asked Jan 06 '13 16:01

user1593175


2 Answers

In general writing this an explicit OUTER JOIN would be better.

SQL Server will probably need to add an Assert to the plan with the sub query version that verifies the sub query only returns at most one row (unless this is guaranteed by a unique index). This can limit the possible transformations available. See Scalar Subqueries for more about this.

Also (though not relevant to the example in your question as both sub queries are different) writing as an explicit JOIN allows you to use multiple columns from the joined table with one lookup whereas using separate similar subqueries would not (SQL Server has no logic to detect the common sub expressions).

Edit:

Following discussion in comments something like

SELECT NULL                      AS StoryID,
       A.AlbumID,
       CAST(NULL AS VARCHAR(30)) AS StoryTitle,
       A.AlbumName,
       AP.URL                    AS AlbumCover,
       A.Votes,
       CAST(NULL AS INT)         AS PictureId,
       'albums'                  AS tableName,
       CASE
         WHEN EXISTS (SELECT *
                      FROM   NestedAlbums NA
                      WHERE  NA.AlbumID = A.AlbumID
                             AND ( AccountId = @AccountId )) THEN 1
         ELSE 0
       END                       AS Flag
INTO   #Results2
FROM   Albums A
       LEFT OUTER JOIN AlbumPictures AP
         ON ( AP.AlbumID = A.AlbumID )
            AND ( AP.AlbumCover = 'True' )
WHERE  A.AlbumID IN (SELECT StringVal
                     FROM   funcListToTableInt(@whereAlbumID)) 

You may notice this still has a sub query in the SELECT list but CASE ... EXISTS will be implemented efficiently as a semi join.

At the moment your query assumes that at most one matching row per album will be returned from AlbumPictures and would error out if this assumption is not true. This changes the semantics in that no error will be returned and you would get multiple rows with the various URLs. If you didn't want that to happen you would need to define which URL to use and add a GROUP BY

like image 75
Martin Smith Avatar answered Oct 23 '22 11:10

Martin Smith


The first difference is, the FROM clause governs initial cardinality of the results.

In your case, the results will have one row per row in Album. Scalar subqueries in the SELECT clause cannot change this. If a subquery happens to return multiple rows, SQL Server will throw an exception. It will never add to the results.

When you move this logical join to the FROM clause, you redefine your initial cardinality. It is no longer one row per row in Album, but one row per row in Album LEFT OUTER JOIN AblumPictures ON... etc. If that yields multiple rows per row in Album, SQL Server will not throw an exception as it did for the subselect. Rather, it will add rows to the results.

So in this regard, the subquery does a better job expressing intent, and a better job proactively protecting you from data that violates that intent: "give me one row per Album, and for each album, include the URL for here, the nested ID from there" etc.

HOWEVER, functionally speaking, there is a huge drawback: scalar subqueries cannot return whole tuples. You've gone through all this work to write the subquery, SQL Server has gone through all this work to execute it, and now you are limited to this one single scalar return value! Sometimes that's fine, but sometimes you need more. When you need more, you need the FROM clause.

The closest FROM-clause equivalent to a scalar subquery is not OUTER JOIN, but rather the wondrous OUTER APPLY. OUTER APPLY is not a scalar expression: it return whole tuples, and any number of rows.

First approximation:

SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
OUTER APPLY (
  SELECT TOP (1) * FROM AlbumPictures
  WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')
  ) AlbumPictures
OUTER APPLY (
  SELECT TOP (1) * FROM NestedAlbums 
  WHERE (AlbumID = Albums.AlbumID)
  ) NestedAlbums 
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

So, by virtue of TOP (1), Albums still governs the initial cardinality of the results. However, we now have access to all the columns from the related tables, and this is awesome.

Then, if we are confident that TOP (1) is not necessary--that by virtue of keys and indexes the subquery can only ever return one row--then we can rewrite using a simpler form:

SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
OUTER APPLY (
  SELECT * FROM AlbumPictures
  WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')
  ) AlbumPictures
OUTER APPLY (
  SELECT * FROM NestedAlbums 
  WHERE (AlbumID = Albums.AlbumID)
  ) NestedAlbums 
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

Which is now logically equivalent to OUTER JOIN:

SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
LEFT OUTER JOIN AlbumPictures 
  ON AlbumPictures.AlbumID = Albums.AlbumID 
 AND AlbumPictures.AlbumCover = 'True'
LEFT OUTER JOIN NestedAlbums 
  ON NestedAlbums.AlbumID = Albums.AlbumID
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

And there you have it. Which is better? Well, whatever you do, please keep it simple.

PERFORMANCE WISE, generally, there is not a whole heck of a lot of difference between the forms. You can compare execution plans side-by-side for your particular tables and indexes. It's a good learning experience to see how SQL Server rewrites logically equivalent queries. I'd expect to see identical plans for OUTER APPLY (w/o TOP (1)) and LEFT OUTER JOIN.

like image 27
Peter Radocchia Avatar answered Oct 23 '22 11:10

Peter Radocchia