Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Outer Apply Returning columns unexpectedly NOT NULL when no match

I'm hitting some weird behavior on a table valued function when used with OUTER APPLY. I have a simple inline function that returns some simple calculations based on a row in another table. When the input values for the TVF are hard-coded scalars, there is no row returned. When I take the same scalars and make a single row out of them in a CTE, then feed them in as columns using CROSS APPLY, no result set. When I do the same with OUTER APPLY, I get 1 row (as expected), but two of the output columns are NULL and the other two NOT NULL. Based on BOL, that shouldn't happen with an OUTER APPLY. Is this a user error? I wrote a simple version to demonstrate the issue.

--Test set-up
CREATE FUNCTION dbo.TVFTest
(
       @keyID INT,
       @matchValue1 MONEY,
       @matchValue2 MONEY
)
RETURNS TABLE AS RETURN
(
WITH TestRow
     AS (SELECT @keyID       AS KeyID,
                @matchValue1 AS MatchValue1,
                @matchValue2 AS MatchValue2)
SELECT KeyID,
       MatchValue1,
       MatchValue2,
       CASE
         WHEN MatchValue1 <> MatchValue2
           THEN 'Not equal'
         ELSE 'Something else'
       END AS MatchTest
FROM   TestRow
WHERE  MatchValue1 <> MatchValue2 
)
GO

Query

WITH Test AS
(
       SELECT 12 AS PropertyID,
              $350000 AS Ap1,
              350000 AS Ap2
)
SELECT LP.*
FROM Test T
OUTER APPLY dbo.TVFTest
(
       T.PropertyID,
       T.Ap1,
       T.Ap2
) LP;

Results

+-------+-------------+-------------+-----------+
| KeyID | MatchValue1 | MatchValue2 | MatchTest |
+-------+-------------+-------------+-----------+
|    12 | 350000.00   | NULL        | NULL      |
+-------+-------------+-------------+-----------+

Using Cross Apply returns no rows as expected. Also removing the CTE and using inline constants returns no row.

--Scalars, no row here...
SELECT LP.*
FROM dbo.TVFTest
(
       12,
       $350000,
       350000
) LP;
like image 714
CDC Avatar asked Sep 04 '15 22:09

CDC


People also ask

What does outer apply do in SQL?

The OUTER APPLY operator returns all the rows from the left table expression irrespective of its match with the right table expression. For those rows for which there are no corresponding matches in the right table expression, it contains NULL values in columns of the right table expression.

What is difference between cross apply and outer?

Thus, the CROSS APPLY is similar to an INNER JOIN, or, more precisely, like a CROSS JOIN with a correlated sub-query with an implicit join condition of 1=1. The OUTER APPLY operator returns all the rows from the left table expression regardless of its match with the right table expression.

Is Outer apply same as LEFT join?

OUTER APPLY resembles LEFT JOIN, but has an ability to join table-evaluated functions with SQL Tables. OUTER APPLY's final output contains all records from the left-side table or table-evaluated function, even if they don't match with the records in the right-side table or table-valued function.

What is difference between cross apply and inner join?

In simple terms, a join relies on self-sufficient sets of data, i.e. sets should not depend on each other. On the other hand, CROSS APPLY is only based on one predefined set and can be used with another separately created set. A worked example should help with understanding this difference.


2 Answers

This is certainly a bug in the product.

A similar bug was already reported and closed as "Won't Fix".

Including this question, the linked connect item and another two questions on this site I have seen four cases of this type of behaviour with inline TVFs and OUTER APPLY - All of them were of the format

OUTER APPLY dbo.SomeFunction(...) F

And returned correct results when written as

OUTER APPLY (SELECT * FROM dbo.SomeFunction(...)) F

So this looks like a possible workaround.

For the query

WITH Test AS
(
       SELECT 12 AS PropertyID,
              $350000 AS Ap1,
              350000 AS Ap2
)
SELECT LP.*
FROM Test T
OUTER APPLY dbo.TVFTest
(
       T.PropertyID,
       T.Ap1,
       T.Ap2
) LP;

The execution plan looks like

enter image description here

And the list of output columns in the final projection is. Expr1000, Expr1001, Expr1003, Expr1004.

However only two of those columns are defined in the table of constants in the bottom right.

The literal $350000 is defined in the table of constants in the top right (Expr1001). This then gets outer joined onto the table of constants in the bottom right. As no rows match the join condition the two columns defined there (Expr1003, Expr1004) are correctly evaluated as NULL. then finally the compute scalar adds the literal 12 into the data flow as a new column (Expr1000) irrespective of the result of the outer join.

These are not at all the correct semantics. Compare with the (correct) plan when the inline TVF is manually inlined.

WITH Test
     AS (SELECT 12      AS PropertyID,
                $350000 AS Ap1,
                350000  AS Ap2)
SELECT LP.*
FROM   Test T
       OUTER APPLY (SELECT KeyID,
                           MatchValue1,
                           MatchValue2,
                           CASE
                             WHEN MatchValue1 <> MatchValue2
                               THEN 'Not equal'
                             ELSE 'Something else'
                           END AS MatchTest
                    FROM   (SELECT T.PropertyID AS KeyID,
                                   T.Ap1        AS MatchValue1,
                                   T.Ap2        AS MatchValue2) TestRow
                    WHERE  MatchValue1 <> MatchValue2) LP 

enter image description here

Here the columns used in the final projection are Expr1003, Expr1004, Expr1005, Expr1006. All of these are defined in the bottom right constant scan.

In the case of the TVF it all seems to go wrong very early on.

Adding OPTION (RECOMPILE, QUERYTRACEON 3604, QUERYTRACEON 8606); shows the input tree to the process is already incorrect. Expressed in SQL it is something like.

SELECT Expr1000,
       Expr1001,
       Expr1003,
       Expr1004
FROM   (VALUES (12,
               $350000,
               350000)) V1(Expr1000, Expr1001, Expr1002)
       OUTER APPLY (SELECT Expr1003,
                           IIF(Expr1001 <> Expr1003, 
                               'Not equal', 
                               'Something else') AS Expr1004
                    FROM   (SELECT CAST(Expr1002 AS MONEY) AS Expr1003) D
                    WHERE  Expr1001 <> Expr1003) OA 

The full output of that trace flag is as follows (And 8605 shows basically the same tree.)

*** Input Tree: ***
        LogOp_Project COL: Expr1000  COL: Expr1001  COL: Expr1003  COL: Expr1004 

            LogOp_Apply (x_jtLeftOuter)

                LogOp_Project

                    LogOp_ConstTableGet (1) [empty]

                    AncOp_PrjList 

                        AncOp_PrjEl COL: Expr1000 

                            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=12)

                        AncOp_PrjEl COL: Expr1001 

                            ScaOp_Const TI(money,ML=8) XVAR(money,Not Owned,Value=(10000units)=(-794967296))

                        AncOp_PrjEl COL: Expr1002 

                            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=350000)

                LogOp_Project

                    LogOp_Select

                        LogOp_Project

                            LogOp_ConstTableGet (1) [empty]

                            AncOp_PrjList 

                                AncOp_PrjEl COL: Expr1003 

                                    ScaOp_Convert money,Null,ML=8

                                        ScaOp_Identifier COL: Expr1002 

                        ScaOp_Comp x_cmpNe

                            ScaOp_Identifier COL: Expr1001 

                            ScaOp_Identifier COL: Expr1003 

                    AncOp_PrjList 

                        AncOp_PrjEl COL: Expr1004 

                            ScaOp_IIF varchar collate 53256,Var,Trim,ML=14

                                ScaOp_Comp x_cmpNe

                                    ScaOp_Identifier COL: Expr1001 

                                    ScaOp_Identifier COL: Expr1003 

                                ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=9) XVAR(varchar,Owned,Value=Len,Data = (9,Not equal))

                                ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=14) XVAR(varchar,Owned,Value=Len,Data = (14,Something else))

            AncOp_PrjList 

*******************
like image 117
Martin Smith Avatar answered Oct 20 '22 21:10

Martin Smith


I did some further research (SQL Server 2012) - and this is really weird!

You can simplify this. It seems to me that it has something to do with implicit type conversion. That's the reason why I tried around with data types...

Try this:

--Test set-up
CREATE FUNCTION dbo.TVFTest
(
       @ValueInt INT,
       @ValueMoney MONEY,
       @ValueVarchar VARCHAR(10),
       @ValueDate DATE,
       @DateAsVarchar DATE

)
RETURNS TABLE AS RETURN
(
       SELECT @ValueInt AS ValueInt
             ,@ValueMoney AS ValueMoney
             ,@ValueVarchar AS ValueVarchar
             ,@ValueDate AS ValueDate
             ,@DateAsVarchar AS DateAsVarchar
        WHERE 1 != 1
)
GO

This function will never return a line due to the WHERE...

DECLARE @d AS DATE='20150101';

This typed date variable is needed later, try to replace it in the calls by GETDATE()...

--direct call: comes back with no row
SELECT * FROM dbo.TVFTest(1,2,'test',@d,'20150101');

--parameters via CTE: 
WITH Test AS
(
       SELECT 1 AS valint,
              2 AS valmoney,
              'test' AS valchar,
              @d AS valdate, --try GETDATE() here!
              '20150101' AS valdateasvarchar
)
SELECT * 
FROM Test AS T
OUTER APPLY dbo.TVFTest(T.valint,T.valmoney,T.valchar,T.valdate,T.valdateasvarchar) AS LP;

Both implicitly converted parameters (Money and DateAsVarchar) don't show up, but the INT, the VARCHAR and the "real" DATE do!!!
Look at the execution plan: enter image description here This call was done with GETDATE(). Otherwise there'd be only 2 scalar operators...

EDIT: The first "Compute Scalar" in the execution plan shows all columns, the Constant Scan (scanning an internal table with constants) has only two columns (three if you use GETDATE()). The "bad" columns don't even seem to be part of the CTE at this stage...

--parameters via CTE with single calls 
WITH Test AS
(
       SELECT 1 AS valint,
              2 AS valmoney,
              'test' AS valchar,
              @d AS valdate,
              '20150101' AS valdateasvarchar
)
SELECT * FROM dbo.TVFTest((SELECT valint FROM Test)
                         ,(SELECT valmoney FROM Test)
                         ,(SELECT valchar FROM Test)
                         ,(SELECT valdate FROM Test)
                         ,(SELECT valdateasvarchar FROM Test));
GO
DROP FUNCTION dbo.TVFTest;

Just one more try, this returns with the expected result (empty)

My conclusio: Only scalar values which need some extra handling are handled and therefore "know" that they shouldn't show up. All scalar values which can be passed through without any extra work are not handled within the function and show up - which is a bug.

What's your opinion?

like image 38
Shnugo Avatar answered Oct 20 '22 19:10

Shnugo