Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MSSQL cast( [varcharColumn] to int) in SELECT gets executed before WHERE clause filters out bad values

Assume the following schema and query:

Please look past the glaring design issues with having values in a varchar column that we expect to be ints.

create table dbo.Parent (
    Id bigint NOT NULL,
    TypeId int NOT NULL
)

create table dbo.Child (
    Id bigint NOT NULL,
    ParentId bigint NOT NULL,
    TypeId int NOT NULL,
    varcharColumn varchar(300) NULL
)

select cast(c.varcharColumn as int)
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13

The break:

We get a cast break due to a value that cannot be converted to an int. In this case: "123-1". The strange thing is that the value being cast gets filtered out of the final result set.

For example, this returns zero results

select c.varcharColumn
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13
    and c.varcharColumn = '123-1'

The query plan ends up looking at the Child table and actually applying the cast function before the where clause.

We were able to fix this by creating a new index on the child table (it was doing a PK scan)

create index [NCIDX_dbo_Child__TypeId] on dbo.Child (
    TypeId
)
include (
    ParentId,
    varcharColumn
)

It now filters on the parent table's where clause first.

Is there any way to fix this without the extra index? And again, please refrain from any suggestions related to fixing our schema. That is definitely the proper fix in this case.

I'm mostly interested in understanding why it applied the cast before it filtered the result set.

Thanks

Edit - Answer:

Many thanks to both Aaron and Gordon. If I ever get more than 15 rep points, I'll come back and up both of your replies.

We ended up needing Gordon's answer since we wanted to use this query in a view. A few folks at the office were wary of using a case statement because they prefer to have more control over ensuring that we have a smaller result set first (Aaron's answer), however it all boils down to looking at the query plan and checking your read counts.

Again, thanks for all of the responses!

like image 580
Chris Barry Avatar asked Sep 06 '12 18:09

Chris Barry


People also ask

What does CAST () do in SQL?

The CAST() function converts a value (of any type) into a specified datatype.

Can you CAST a string to an int in SQL?

CAST and CONVERT can be used to convert a string to a number of any data type. For example, you can convert a string to a number of data type INTEGER. TO_DATE converts a formatted date string to a date integer.

How can avoid in clause in SQL Server?

Select U. UserIds from User as U join Admins as A on U.

What is CAST () and convert () functions in SQL Server?

The T-SQL language offers two functions to convert data from one data type to a target data type: CAST and CONVERT. In many ways, they both do the exact same thing in a SELECT statement or stored procedure, but the SQL Server CONVERT function has an extra parameter to express style.


2 Answers

First, this is not a "glaring design issue". SQL is a descriptive language of the output, not a procedural language that specifies how prcoessing is being done. There is no guarantee of the order of processing, in general, and this is an advantage. I might say there is a design issue, but it is around the general handling of exceptions in SQL statements.

According to SQL Server documentation (http://msdn.microsoft.com/en-us/library/ms181765.aspx), you can depend on the order of evauation for a CASE statement for scalar expressions. So, the following should work:

select (case when isnumeric(c.varcharColumn) = 1 then cast(c.varcharColumn as int) end)

Or, to get closer to an "int" expression:

select (case when isnumeric(c.varcharColumn) = 1 and c.varcharColumn not like '%.%' and c.varcharColumn not like '%e%'
             then cast(c.varcharColumn as int)
        end)

At least your code is doing an explicit CAST. This situation is much nastier when the casts are implicit (and there are hundreds of columns).

like image 54
Gordon Linoff Avatar answered Oct 06 '22 01:10

Gordon Linoff


You can't easily control the way SQL Server processes your query. You can figure out some of the why by deep diving into the execution plan, but understanding that is the least of your problems in this specific case I think. You can do a little with join hints, perhaps, but that's hacky to me and the behavior is still not guaranteed (especially as you move to new versions etc). Two workarounds you could try are:

;WITH c AS 
(
  SELECT varcharColumn, ParentID, TypeId
   FROM dbo.Child AS c
   WHERE c.TypeId = 2
   AND ISNUMERIC(varcharColumn) = 1 --*
)
SELECT CONVERT(INT, c.varcharColumn)
FROM dbo.Parent AS p
INNER JOIN c
ON c.ParentId = p.Id
WHERE p.TypeId = 13;

But I have heard of cases where even separating this out into a CTE could lead to the bad plan that led the convert to occur first. So it may be that you need to break it out even further:

SELECT varcharColumn, ParentID, TypeId
INTO #c
   FROM dbo.Child AS c
   WHERE c.TypeId = 2
   AND ISNUMERIC(varcharColumn) = 1; --*

SELECT CONVERT(INT, c.varcharColumn)
  FROM dbo.Parent AS p
  INNER JOIN #c AS c
  ON c.ParentId = p.Id
  WHERE p.TypeId = 13;

(I also talk about the CASE expression solution in this answer.)

If you are on SQL Server 2012, you can simply do this - now it doesn't matter if the convert is attempted before the filter, and you don't have to rely on the wonky ISNUMERIC() function.*

SELECT TRY_CONVERT(INT, c.varcharColumn)
  FROM dbo.Parent AS p
  INNER JOIN dbo.Child AS c
  ON c.ParentId = p.Id
  WHERE c.TypeId = 2
  AND p.TypeId = 13;

* Please note that IsNumeric is not perfect. I wrote this article several years ago to help deal with this.

like image 30
Aaron Bertrand Avatar answered Oct 06 '22 01:10

Aaron Bertrand