Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid repeating a subquery that references multiple joined tables

I have a subquery (LastActivityOn) that I'd like to use in three places, my projection (SELECTed output), ORDER BY, and WHERE clause.

SELECT TOP 175
  (SELECT MAX(ActivityDate) FROM (VALUES
    (UserRegistration.CreatedOn),
    (UserRegistration.ActivatedOn),
    (UserRegistration.LastLoginOn),
    (UserRegistration.UpdatedOn),
    (UserProfile.LastPostedOn)) AS AllDates(ActivityDate)) LastActivityOn,
  UserRegistration.FirstName,
  UserRegistration.LastName,
  [15 more columns of various calculated distances, coalesces, etc...]
FROM
  UserRegistration
  INNER JOIN UserProfile ON UserRegistration.Id = UserProfile.RegistrationId
  INNER JOIN (
    SELECT PostalCode, GeoCenter, PrimaryCity, StateOrProvince
    FROM PostalCodes 
    WHERE @OriginPostalCode IS NULL OR PostalCodes.GeoCenter.STDistance(@OriginPoint) < @WithinMeters
  ) AS ProximalPostalCodes ON ProximalPostalCodes.PostalCode = UserRegistration.PostalCode
  [7 more joins including full-text queries]
WHERE
  LastActivityOn > @OldestUserToSearch AND
  [20 more lines of filtering logic]
ORDER BY
  LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2),
  FullTextRelevance

Note the three occurrences of LastActivityOn. Also note that the LastActivityOn subquery references two tables. I suppose because it depends on the join clause in the parent query, it is inherently a correlated subquery?

When I was only taking the maximum of two dates via a User-Defined-Function, I was able to use the resulting value in my WHERE and ORDER BY. Now I cannot.

It seems like I have a few options... I could wrap the whole thing in another query, repeating the projection with just the added activity. It seems like I may be able to use "WITH" (a CTE) in the same way.

But because I don't understand clearly the rules of when I can and cannot use a subquery the way I want to, I could easily be missing something. Any ideas?

Or maybe SQL SERVER will be smart enough to only perform the calculation once for each output row, and I shouldn't worry about it?

EDIT: Currently running SQL Server 2008 Standard, but an upgrade will be in-order at some point. Also, RE: the log function - I'm working to combine with with relevance as a weighted total, so that's a work-in-progress. I'll either trim it with INT to use as a type of ranking, or add it to relevance with a linear adjustment.

CORRECTION: I was able to use the subquery alias in my ORDER BY, but not with any additional calculations or in the where clause. Thanks to ypercube for pointing that out.

like image 309
shannon Avatar asked Dec 27 '22 20:12

shannon


2 Answers

I don't try to modify your query, but may be common table expression is what you need.

like image 179
ceth Avatar answered May 04 '23 00:05

ceth


You cannot use the LastActivityOn alias in the WHERE clause but you can use it in the ORDER BY.

If you want to not repeat the code in 2 places (SELECT and WHERE), you can either use a CTE or select this LastActivityOn result - and the whole subquery - in a derived table and then use it in the external level:

SELECT TOP 175
  LastActivityOn,
  FirstName,
  LastName,
  ...
FROM
    ( SELECT
        ( SELECT MAX(ActivityDate) 
          FROM 
            ( VALUES
                (UserRegistration.CreatedOn),
                (UserRegistration.ActivatedOn),
                (UserRegistration.LastLoginOn),
                (UserRegistration.UpdatedOn),
                (UserProfile.LastPostedOn)
            ) AS AllDates(ActivityDate)
        ) LastActivityOn,
        UserRegistration.FirstName,
        UserRegistration.LastName,
        [15 more columns of various calculated distances, coalesces, etc...]
      FROM
        UserRegistration
        INNER JOIN UserProfile ON UserRegistration.Id = UserProfile.RegistrationId
        INNER JOIN (
          SELECT PostalCode, GeoCenter, PrimaryCity, StateOrProvince
          FROM PostalCodes 
          WHERE @OriginPostalCode IS NULL 
             OR PostalCodes.GeoCenter.STDistance(@OriginPoint) < @WithinMeters
        ) AS ProximalPostalCodes 
            ON ProximalPostalCodes.PostalCode = UserRegistration.PostalCode
        [7 more joins including full-text queries]
      WHERE
        [20 or more lines of filtering logic]
    ) AS tmp
WHERE
  LastActivityOn > @OldestUserToSearch AND
  [any of the 20 lines that has "LastActivityO"]
ORDER BY
  LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2),
  FullTextRelevance ;

SQL-Server will probably be enough clever and not execute the same code twice, but that may depend on the version you are running. The optimizer has advanced a lot from version 2000 to 2012 (and the Express or other editions may not have the same capabilities as the Standard or the Enterprise edition)


Irrelevant to the question but I think that because the LOG() function is monotonic, the:

ORDER BY
  LOG(DATEDIFF(WEEK, LastActivityOn, @Today))/LOG(2)

is equivalent to the simpler:

ORDER BY
  DATEDIFF(WEEK, LastActivityOn, @Today))
like image 34
ypercubeᵀᴹ Avatar answered May 04 '23 01:05

ypercubeᵀᴹ