I'm having a problem with some SQL server queries. Turns out that I have a table with "Attibute_Name" and "Attibute_Value" fields, which can be of any type, stored in varchar. (Yeah... I know.)
All the dates for a particular attribute seem to be stored the the "YYYY-MM-DD hh:mm:ss" format (not 100% sure about that, there are millions of records here), so I can execute this code without problems:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
However, if I execute the following code:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
I will get the following error: Conversion failed when converting date and/or time from character string.
How come it fails on the where clause and not on the select one?
Another clue:
If instead of filtering by the Attribute_Name I use the actual Attribute_ID stored in database (PK) it will work without problem.
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_ID = 15
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
Update Thanks everyone for the answers. I found it hard to actually choose a correct answer because everyone pointed out something that was useful to understanding the issue. It was definitely having to do with the order of execution. Turns out that my first query worked correctly because the WHERE clause was executed first, then the SELECT. My second query failed because of the same reason (as the Attributes were not filtered, the conversion failed while executing the same WHERE clause). My third query worked because the ID was part of an index (PK), so it took precedence and it drilled down results on that condition first.
Thanks!
You seem to be assuming some sort of short circuiting evaluation or guaranteed ordering of the predicates in the WHERE
clause. This is not guaranteed. When you have mixed datatypes in a column like that the only safe way of dealing them is with a CASE
expression.
Use (e.g.)
CONVERT(DATETIME,
CASE WHEN ISDATE(pa.Attribute_Value) = 1 THEN pa.Attribute_Value END)
Not
CONVERT(DATETIME, pa.Attribute_Value)
This has to do with the order that a SELECT
query is processed. The WHERE
clause is processed long before the SELECT
. It has to determine which rows to include/exclude. The clause that uses the name must use a scan that investigates all rows, some of which do not contain valid date/time data, whereas the key probably leads to a seek and none of the invalid rows are included at the point. The convert in the SELECT list is performed last, and clearly by this time it is not going to try to convert invalid rows. Since you're mixing date/time data with other data, you may consider storing date or numeric data in dedicated columns with correct data types. In the meantime, you can defer the check in the following way:
SELECT /* ... */
FROM
(
SELECT /* ... */
FROM ProductAttributes AS pa
INNER JOIN dbo.Attributes AS a
ON a.Attribute_ID = pa.Attribute_ID
WHERE a.Attribute_Name = 'SomeDate'
AND ISDATE (pa.Attribute_Value) = 1
) AS z
WHERE CONVERT(CHAR(8), AttributeValue, 112) < CONVERT(CHAR(8), GETDATE(), 112);
But the better answer is probably to use the Attribute_ID
key instead of the name if possible.
If the conversion is in the WHERE clause it may be evaluated for many more records (values) than it would be if it appears in the projection list. I have talked about this before in different context, see T-SQL functions do no imply a certain order of execution and On SQL Server boolean operator short-circuit. Your case is even simpler, but is similar, and ultimately the root cause is the same: do not an assume an imperative execution order when dealing with a declarative language like SQL.
Your best solution, by a far and a large margin, is to sanitize the data and change the column type to a DATETIME or DATETIME2 type. All other workarounds will have one shortcoming or another, so you may be better to just do the right thing.
Update
After a closer look (sorry, I'm @VLDB and only peeking SO between sessions) I realize you have an EAV store with inherent type-free semantics (the attribute_value
can bea string, a date, an int etc). My opinion is that your best bet is to use sql_variant
in storage and all the way up to the client (ie. project sql_variant
). You can coherce the type in the client, all client APIs have methods to extract the inner type from a sql_variant
, see Using sql_variant Data (well, almost all client APIs... Using the sql_variant datatype in CLR). With sql_variant
you can store multiple types w/o the problems of going through a string representations, you can use SQL_VARIANT_PROPERTY
to inspect things like the BaseType
in the stored values, and you can even do thinks like check constraints to enforce data type correctness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With