Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

4000 character limit in LIKE statement

I have been getting an error in a previously working stored procedure called by an SSRS report and I have traced it down to a LIKE statement in a scalar function that is called by the stored procedure, in combination with a 7000+ NVARCHAR(MAX) string. It is something similar to:

Msg 8152, Level 16, State 10, Line 14
String or binary data would be truncated.

I can reproduce it with the following code:

DECLARE @name1 NVARCHAR(MAX) = ''
DECLARE @name2 NVARCHAR(MAX) = ''
DECLARE @count INT = 4001
WHILE @count > 0
    BEGIN
        SET @name1 = @name1 + 'a'
        SET @name2 = @name2 + 'a'
        SET @count = @count - 1
    END

SELECT LEN(@name1)

IF @name1 LIKE @name2
    PRINT 'OK'

What's the deal? Is there anyway around this limitation, or is it there for good reason? Thanks.

like image 855
JonathanN Avatar asked Jan 12 '23 10:01

JonathanN


2 Answers

You can also reproduce it without the terrible loop:

DECLARE @name1 NVARCHAR(MAX), @name2 NVARCHAR(MAX);

SET @name1 = REPLICATE(CONVERT(NVARCHAR(MAX), N'a'), 4000);
SET @name2 = @name1;

IF @name1 LIKE @name2
  PRINT 'OK';

SELECT @name1 += N'a', @name2 += N'a';

IF @name1 LIKE @name2
  PRINT 'OK';

Result:

OK
Msg 8152, Level 16, State 10, Line 30
String or binary data would be truncated.

In any case, the reason is clearly stated in the documentation for LIKE (emphasis mine):

match_expression [ NOT ] LIKE pattern [ ESCAPE escape_character ]

...

pattern

Is the specific string of characters to search for in match_expression, and can include the following valid wildcard characters. pattern can be a maximum of 8,000 bytes.

And 8,000 bytes is used up by 4,000 Unicode characters.

I would suggest that comparing the first 4,000 characters is probably sufficient:

WHERE column LIKE LEFT(@param, 4000) + '%';

I can't envision any scenario where you want to compare the whole thing; how many strings contain the same first 4000 characters but then character 4001 is different? If that really is a requirement, I guess you could go to the great lengths identified in the Connect item David pointed out.

A simpler (though probably much more computationally expensive) workaround might be:

IF CONVERT(VARBINARY(MAX), @name1) = CONVERT(VARBINARY(MAX), @name2)
  PRINT 'OK';

I suggest that it would be far better to fix the design and stop identifying rows by comparing large strings. Is there really no other way to identify the row you're after? This is like finding your car in the parking lot by testing the DNA of all the Dunkin Donuts cups in all the cup holders, rather than just checking the license plate.

like image 78
Aaron Bertrand Avatar answered Jan 14 '23 22:01

Aaron Bertrand


I have the same problem right now, and I do believe my situation -where you want to compare two strings with more than 4000 characters- is a possible situation :-). In my situation, I'm collecting a lot of data from different tables in a NVARCHAR(MAX) field in a specific table, to be able to search on that data using FullText. Keeping that table in sync, is done using the MERGE statement, converting everything to NVARCHAR(MAX). So my MERGE statement would look like this:

MERGE MyFullTextTable AS target  
    USING (
        SELECT --Various stuff from various tables, casting it as NVARCHAR(MAX)
            ...
        ) AS source (IndexColumn, FullTextColumn)  
    ON (target.IndexColumn = source.IndexColumn)  
WHEN MATCHED AND source.FullTextColumn NOT LIKE target.FullTextColumn THEN   
        UPDATE SET FullTextColumn = source.FullTextColumn
WHEN NOT MATCHED THEN  
    INSERT (IndexColumn, FullTextColumn)  
    VALUES (source.IndexColumn, source.FullTextColumn)
    OUTPUT -- Some stuff

This would produce errors because of the LIKE comparison when the FullText-data is bigger than 4000 characters.

So I created a function that does the comparison. Allthough it's not bullet proof, it works for me. You could also split data in blocks of 4000 characters, and compare each block, but for me (for now) comparing the first 4000 characters in combination with the length, is enough ...

So the Merge-statement would look like:

MERGE MyFullTextTable AS target  
    USING (
        SELECT --Various stuff from various tables, casting it as NVARCHAR(MAX)
            ...
        ) AS source (IndexColumn, FullTextColumn)  
    ON (target.IndexColumn = source.IndexColumn)  
WHEN MATCHED AND udfCompareTwoTexts(source.FullTextColumn, target.FullTextColumn) = 1 THEN   
        UPDATE SET FullTextColumn = source.FullTextColumn
WHEN NOT MATCHED THEN  
    INSERT (IndexColumn, FullTextColumn)  
    VALUES (source.IndexColumn, source.FullTextColumn)
    OUTPUT -- Some stuff

And the function looks like:

ALTER FUNCTION udfCompareTwoTexts
(
    @Value1 AS NVARCHAR(MAX),
    @Value2 AS NVARCHAR(MAX)
)
RETURNS BIT
AS
BEGIN
    DECLARE @ReturnValue AS BIT = 0
    IF LEN(@Value1) > 4000 OR LEN(@Value2) > 4000 
        BEGIN
            IF LEN(@Value1) = LEN(@Value2) AND LEFT(@Value1, 4000) LIKE LEFT(@Value2, 4000)
                SET @ReturnValue = 1
            ELSE
                SET @ReturnValue = 0
        END
    ELSE
        BEGIN
            IF @Value1 LIKE @Value2
                SET @ReturnValue = 1
            ELSE
                SET @ReturnValue = 0
        END
    RETURN @ReturnValue;
END
GO
like image 21
Tim Avatar answered Jan 14 '23 23:01

Tim