Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conversion to datetime fails only on WHERE clause?

I'm having a problem with some SQL server queries. Turns out that I have a table with "Attibute_Name" and "Attibute_Value" fields, which can be of any type, stored in varchar. (Yeah... I know.)

All the dates for a particular attribute seem to be stored the the "YYYY-MM-DD hh:mm:ss" format (not 100% sure about that, there are millions of records here), so I can execute this code without problems:

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_Name = 'SomeDate'

However, if I execute the following code:

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_Name = 'SomeDate'
    and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()

I will get the following error: Conversion failed when converting date and/or time from character string.

How come it fails on the where clause and not on the select one?

Another clue:

If instead of filtering by the Attribute_Name I use the actual Attribute_ID stored in database (PK) it will work without problem.

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_ID = 15
    and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()

Update Thanks everyone for the answers. I found it hard to actually choose a correct answer because everyone pointed out something that was useful to understanding the issue. It was definitely having to do with the order of execution. Turns out that my first query worked correctly because the WHERE clause was executed first, then the SELECT. My second query failed because of the same reason (as the Attributes were not filtered, the conversion failed while executing the same WHERE clause). My third query worked because the ID was part of an index (PK), so it took precedence and it drilled down results on that condition first.

Thanks!

like image 489
Alpha Avatar asked Aug 31 '11 21:08

Alpha


3 Answers

You seem to be assuming some sort of short circuiting evaluation or guaranteed ordering of the predicates in the WHERE clause. This is not guaranteed. When you have mixed datatypes in a column like that the only safe way of dealing them is with a CASE expression.

Use (e.g.)

CONVERT(DATETIME, 
      CASE WHEN ISDATE(pa.Attribute_Value) = 1 THEN pa.Attribute_Value END)

Not

CONVERT(DATETIME, pa.Attribute_Value)
like image 175
Martin Smith Avatar answered Sep 30 '22 01:09

Martin Smith


This has to do with the order that a SELECT query is processed. The WHERE clause is processed long before the SELECT. It has to determine which rows to include/exclude. The clause that uses the name must use a scan that investigates all rows, some of which do not contain valid date/time data, whereas the key probably leads to a seek and none of the invalid rows are included at the point. The convert in the SELECT list is performed last, and clearly by this time it is not going to try to convert invalid rows. Since you're mixing date/time data with other data, you may consider storing date or numeric data in dedicated columns with correct data types. In the meantime, you can defer the check in the following way:

SELECT /* ... */
FROM
(
  SELECT /* ... */
    FROM ProductAttributes AS pa
    INNER JOIN dbo.Attributes AS a
    ON a.Attribute_ID = pa.Attribute_ID
    WHERE a.Attribute_Name = 'SomeDate'
    AND ISDATE (pa.Attribute_Value) = 1
) AS z
WHERE CONVERT(CHAR(8), AttributeValue, 112) < CONVERT(CHAR(8), GETDATE(), 112);

But the better answer is probably to use the Attribute_ID key instead of the name if possible.

like image 28
Aaron Bertrand Avatar answered Sep 30 '22 02:09

Aaron Bertrand


If the conversion is in the WHERE clause it may be evaluated for many more records (values) than it would be if it appears in the projection list. I have talked about this before in different context, see T-SQL functions do no imply a certain order of execution and On SQL Server boolean operator short-circuit. Your case is even simpler, but is similar, and ultimately the root cause is the same: do not an assume an imperative execution order when dealing with a declarative language like SQL.

Your best solution, by a far and a large margin, is to sanitize the data and change the column type to a DATETIME or DATETIME2 type. All other workarounds will have one shortcoming or another, so you may be better to just do the right thing.

Update

After a closer look (sorry, I'm @VLDB and only peeking SO between sessions) I realize you have an EAV store with inherent type-free semantics (the attribute_value can bea string, a date, an int etc). My opinion is that your best bet is to use sql_variant in storage and all the way up to the client (ie. project sql_variant). You can coherce the type in the client, all client APIs have methods to extract the inner type from a sql_variant, see Using sql_variant Data (well, almost all client APIs... Using the sql_variant datatype in CLR). With sql_variant you can store multiple types w/o the problems of going through a string representations, you can use SQL_VARIANT_PROPERTY to inspect things like the BaseType in the stored values, and you can even do thinks like check constraints to enforce data type correctness.

like image 36
Remus Rusanu Avatar answered Sep 30 '22 03:09

Remus Rusanu