Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Behavior of unique index, varchar column and (blank) spaces

I'm using Microsoft SQL Server 2008 R2 (with latest service pack/patches) and the database collation is SQL_Latin1_General_CP1_CI_AS.

The following code:

SET ANSI_PADDING ON;
GO

CREATE TABLE Test (
   Code VARCHAR(16) NULL
);
CREATE UNIQUE INDEX UniqueIndex
    ON Test(Code);

INSERT INTO Test VALUES ('sample');
INSERT INTO Test VALUES ('sample ');

SELECT '>' + Code + '<' FROM Test WHERE Code = 'sample        ';
GO

produces the following results:

(1 row(s) affected)

Msg 2601, Level 14, State 1, Line 8

Cannot insert duplicate key row in object 'dbo.Test' with unique index 'UniqueIndex'. The duplicate key value is (sample ).

The statement has been terminated.

‐‐‐‐‐‐‐‐‐‐‐‐

>sample<

(1 row(s) affected)

My questions are:

  1. I assume the index cannot store trailing spaces. Can anyone point me to official documentation that specifies/defines this behavior?
  2. Is there a setting to change this behavior, that is, make it recognize 'sample' and 'sample ' as two different values (which they are, by the way) so both can be in the index.
  3. Why on Earth is the SELECT returning a row? SQL Server must be doing something really funny/clever with the spaces in the WHERE clause because if I remove the uniqueness in the index, both INSERTs will run OK and the SELECT will return two rows!

Any help/pointer in the right direction would be appreciated. Thanks.

like image 899
Eric Avatar asked Feb 27 '12 06:02

Eric


1 Answers

Trailing blanks explained:

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, , General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations.

The only exception to this rule is the LIKE predicate. When the right side of a LIKE predicate expression features a value with a trailing space, SQL Server does not pad the two values to the same length before the comparison occurs. Because the purpose of the LIKE predicate, by definition, is to facilitate pattern searches rather than simple string equality tests, this does not violate the section of the ANSI SQL-92 specification mentioned earlier.

Here's a well known example of all the cases mentioned above:

DECLARE @a VARCHAR(10)
DECLARE @b varchar(10)

SET @a = '1'
SET @b = '1 ' --with trailing blank

SELECT 1
WHERE 
    @a = @b 
AND @a NOT LIKE @b
AND @b LIKE @a

Here's some more detail about trailing blanks and the LIKE clause.

Regarding indexes:

An insertion into a column whose values must be unique will fail if you supply a value that is differentiated from existing values by trailing spaces only. The following strings will all be considered equivalent by a unique constraint, primary key, or unique index. Likewise, if you have an existing table with the data below and try to add a unique restriction, it will fail because the values are considered identical.

PaddedColumn
------------
'abc'
'abc '
'abc  '
'abc    '

(Taken from here.)

like image 110
Oleg Dok Avatar answered Sep 28 '22 08:09

Oleg Dok