I'm currently wondering about some performance differences on VARCHAR/NVARCHAR, especially when using Complex LIKE queries (that start with _ or %).
I have a testsetup on Microsoft SQL Server 2014. I have 2 Tables. Both have an ID Field (identity(1, 1), and a Value Field (either VARCHAR(450) OR NVARCHAR(450)). Both have the identical 1'000'000 randomly generated entries.
The tables are named tblVarCharNoIndex and tblNVarCharNoIndex (so, there are no indexes. The behaviour is nearly the same, if I use indexes).
Now, I execute the following queries test the duration (once on VARCHAR; once on NVARCHAR)
SELECT * FROM tblVarcharNoIndex WHERE Value LIKE '%ab%'
SELECT * FROM tblNVarcharNoIndex WHERE Value LIKE '%ab%'
The execution times are wildly different. It takes 1540ms on the VARCHAR Table, and 8630 ms on the NVARCHAR Table, so it takes over 5x longer with NVARCHAR.
I understand, that NVARCHAR has performance implications, since it needs 2 bytes to store, this totally makes sense. But I can't explain a performance degradation by 500%, this makes no sense to me.
As per Request, here some more Data.
Query for table creation
CREATE TABLE [dbo].[tblVarcharNoIndex](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [varchar](450) NOT NULL,
CONSTRAINT [PK_tblVarcharNoIndex] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[tblNVarcharNoIndex](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](450) NOT NULL,
CONSTRAINT [PK_tblNVarcharNoIndex] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Query for value generation
DECLARE @cnt INT = 0;
DECLARE @entries INT = 1000000 --1'000'000;
DECLARE @maxLength INT = 450;
DECLARE @minLength INT = 50;
DECLARE @value VARCHAR(450)
DECLARE @length INT
WHILE @cnt < @entries
BEGIN
SELECT @value = ''
SET @length = @minLength + CAST(RAND() * (@maxLength - @minLength) as INT)
WHILE @length <> 0
BEGIN
SELECT @value = @value + CHAR(CAST(RAND() * 96 + 32 as INT))
SET @length = @length - 1
END
INSERT INTO tblBase(Value, NValue) VALUES (@value, @value)
SET @cnt = @cnt + 1;
END;
(Values are copied later from tblBase)
LIKE Query in question
DECLARE @start DATETIME
DECLARE @end DATETIME
DECLARE @testname NVARCHAR(100) = 'INSERT FROM other table'
--VARCHAR No Index
PRINT 'starting ''' + @testname + ''' on VARCHAR (No Index)'
SET @start = GETDATE()
SELECT * FROM tblVarcharNoIndex WHERE Value LIKE '%ab%' --This takes 1540ms
SET @end = GETDATE()
PRINT '-- finished ''' + @testname + ''' on VARCHAR (No Index)'
PRINT '-- Duration ' + CAST(DATEDIFF(mcs, @start, @end) AS VARCHAR(100)) + ' microseconds'
--NVARCHAR No Index
PRINT 'starting ''' + @testname + ''' on NVARCHAR (No Index)'
SET @start = GETDATE()
SELECT * FROM tblNVarcharNoIndex WHERE Value LIKE '%ab%' --This takes 8630ms
SET @end = GETDATE()
PRINT '-- finished ''' + @testname + ''' on NVARCHAR (No Index)'
PRINT '-- Duration ' + CAST(DATEDIFF(mcs, @start, @end) AS VARCHAR(100)) + ' microseconds'
Execution Plan The execution plan for both the queries looks exactly the same (I have nowhere to upload the image right now, but it's really simple):
SELECT (0%) <--- Parallelism (Gather Streams) (3%) <--- Clustered Index Scan ON Primary Key (97%)
It will not be possible to give you specifics without more data, starting with an execution plan for both queries.
Some general reasons:
- As you stated there are twice as many bytes to read when doing a scan
- The number of page loads will increase
- The amount of necessary memory will increase, which can cause overflow to disk operations
- The amount of CPU will can increase which might be capped based on OS or SQL settings and be causing CPU waits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With