Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is query with phone = N'1234' slower than phone = '1234'?

I have a field which is a varchar(20)

When this query is executed, it is fast (Uses index seek):

SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'

But this one is slow (uses index scan).

SELECT * FROM [dbo].[phone] WHERE phone = N'5554474477'

I am guessing that if I change the field to an nvarchar, then it would use the Index Seek.

like image 380
Greg Gum Avatar asked Feb 05 '19 18:02

Greg Gum


2 Answers

Because nvarchar has higher datatype precedence than varchar so it needs to perform an implicit cast of the column to nvarchar and this prevents an index seek.

Under some collations it is able to still use a seek and just push the cast into a residual predicate against the rows matched by the seek (rather than needing to do this for every row in the entire table via a scan) but presumably you aren't using such a collation.

The effect of collation on this is illustrated below. When using the SQL collation you get a scan, for the Windows collation it calls the internal function GetRangeThroughConvert and is able to convert it into a seek.

CREATE TABLE [dbo].[phone]
  (
     phone1 VARCHAR(500) COLLATE sql_latin1_general_cp1_ci_as CONSTRAINT uq1 UNIQUE,
     phone2 VARCHAR(500) COLLATE latin1_general_ci_as CONSTRAINT uq2 UNIQUE,
  );

SELECT phone1 FROM [dbo].[phone] WHERE phone1 = N'5554474477';
SELECT phone2 FROM [dbo].[phone] WHERE phone2 = N'5554474477';

enter image description here

The SHOWPLAN_TEXT is below

Query 1

  |--Index Scan(OBJECT:([tempdb].[dbo].[phone].[uq1]),  WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone1],0)=CONVERT_IMPLICIT(nvarchar(4000),[@1],0)))

Query 2

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], [Expr1006], [Expr1004]))
       |--Compute Scalar(DEFINE:(([Expr1005],[Expr1006],[Expr1004])=GetRangeThroughConvert([@1],[@1],(62))))
       |    |--Constant Scan
       |--Index Seek(OBJECT:([tempdb].[dbo].[phone].[uq2]), SEEK:([tempdb].[dbo].[phone].[phone2] > [Expr1005] AND [tempdb].[dbo].[phone].[phone2] < [Expr1006]),  WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone2],0)=[@1]) ORDERED FORWARD)

In the second case the compute scalar emits the following values

Expr1004 = 62
Expr1005 = '5554474477'
Expr1006 = '5554474478'

the seek predicate shown in the plan is on phone2 > Expr1005 and phone2 < Expr1006 so on the face of it would exclude '5554474477' but the flag 62 means that this does match.

like image 50
Martin Smith Avatar answered Oct 13 '22 18:10

Martin Smith


Other answers already explain what happens; we've seen NVARCHAR has higher type precedence than VARCHAR. I want to explain why the database must cast every row for the column as an NVARCHAR, rather than casting the single supplied value as VARCHAR, even though the second option is clearly much faster, both intuitively and empirically. Also, I want to explain why the performance impact can be so drastic.

Casting from NVARCHAR to VARCHAR is a narrowing conversion. That is, NVARCHAR has potentially more information than a similar VARCHAR value. It's not possible to represent every NVARCHAR input with a VARCHAR output, so casting from the former to the latter potentially loses information. The opposite cast is a widening conversion. Casting from a VARCHAR value to an NVARCHAR value never loses information; it's safe.

The principle is for SQL Server to always choose the safe conversion when presented with two mismatched types. It's the same old "correctness trumps performance" mantra. Or, to paraphrase Benjamin Franklin, "He who would trade essential correctness for a little performance deserve neither correctness nor performance." The type precedence rules, then, are designed to ensure the safe conversions are chosen.

Now you and I both know your narrowing conversion is also safe for this particular data, but the SQL Server query optimizer doesn't care about that. For better or worse, it sees the data type information first when building the execution plan and follows the type precedence rules.


This explains why SQL Server has to take the slower option. Now let's talk about why the difference is so drastic. The kicker is once we've determined we have to cast the stored data, rather than the constant value in the query, we have to do it for every row in the table. This is true even for rows which would not otherwise match the comparison filter, because you don't know if the row will match for the filter or not until after you cast the value for the comparison.

But it gets worse. The cast values from the column are no longer the same as the values stored in any indexes you might have defined. The result is any index on the column is now worthless for this query, which cuts to the core of database performance.

I think you're very lucky to get an index scan for this query, rather than a full table scan, and it's likely because there is a covering index that meets the needs of the query (the optimizer can choose to cast all the records in the index as easily as all the records in the table).


You can fix things for this query by explicitly resolving the type mismatch in a more favorable way. The best way to accomplish this is, of course, supplying a plain VARCHAR in the first place and avoid any need for casting/conversion at all:

SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'

But I suspect what we're seeing is a value provided by an application, where you don't necessarily control that part of the literal. If so, you can still do this:

SELECT * FROM [dbo].[phone] WHERE phone = cast(N'5554474477' as varchar(20))

Either example favorably resolves the type mismatch from the original code. Even with the latter situation, you may have more control over the literal than you know. For example, if this query was created from a .Net program the problem is possibly related to the AddWithValue() function. I've written about this issue in the past and how to handle it correctly.

These fixes also help demonstrate why things are this way.


It may be possible at some point in the future the SQL Server developers enhance the query optimizer to look at situations like this — where type precedence rules cause a per-row conversion resulting in a table or index scan, but the opposite conversion involves constant data and could be just an index seek — and in that case first look at the data to see if the faster narrowing conversion could also be safe.

However, I find it unlikely this will ever happen. In my opinion, the corrections to queries within the existing system are too easy relative to the additional cost of evaluating individual queries and the complexity in understanding what the optimizer is doing ("Why didn't the server follow the documented precedence rules here?") to justify it.

like image 46
Joel Coehoorn Avatar answered Oct 13 '22 18:10

Joel Coehoorn