I ran into something a little odd this morning and thought I'd submit it for commentary. Can someone explain why the following SQL query prints 'equal' when run against SQL 2008. The db compatibility level is set to 100. <pre class="prettyprint"><code>if '' = ' ' print 'equal' else print 'not equal' </code></pre> And this returns 0: <pre class="prettyprint"><code>select (LEN(' ')) </code></pre> It appears to be auto trimming the space. I have no idea if this was the case in previous versions of SQL Server, and I no longer have any around to even test it. I ran into this because a production query was returning incorrect results. I cannot find this behavior documented anywhere. Does anyone have any information on this?

The = operator is T-SQL is not so much "equals" as it is "are the same word/phrase, according to the collation of the expression's context," and LEN is "the number of characters in the word/phrase." No collations treat trailing blanks as part of the word/phrase preceding them (though they do treat leading blanks as part of the string they precede). If you need to distinguish 'this' from 'this ', you shouldn't use the "are the same word or phrase" operator because 'this' and 'this ' are the same word. Contributing to the way = works is the idea that the string-equality operator should depend on its arguments' contents and on the collation context of the expression, but it shouldn't depend on the types of the arguments, if they are both string types. The natural language concept of "these are the same word" isn't typically precise enough to be able to be captured by a mathematical operator like =, and there's no concept of string type in natural language. Context (i.e., collation) matters (and exists in natural language) and is part of the story, and additional properties (some that seem quirky) are part of the definition of = in order to make it well-defined in the unnatural world of data. On the type issue, you wouldn't want words to change when they are stored in different string types. For example, the types VARCHAR(10), CHAR(10), and CHAR(3) can all hold representations of the word 'cat', and ? = 'cat' should let us decide if a value of any of these types holds the word 'cat' (with issues of case and accent determined by the collation). Response to JohnFx's comment: See Using char and varchar Data in Books Online. Quoting from that page, emphasis mine: <blockquote> Each char and varchar data value has a collation. Collations define attributes such as the bit patterns used to represent each character, comparison rules, and sensitivity to case or accenting. </blockquote> I agree it could be easier to find, but it's documented. Worth noting, too, is that SQL's semantics, where = has to do with the real-world data and the context of the comparison (as opposed to something about bits stored on the computer) has been part of SQL for a long time. The premise of RDBMSs and SQL is the faithful representation of real-world data, hence its support for collations many years before similar ideas (such as CultureInfo) entered the realm of Algol-like languages. The premise of those languages (at least until very recently) was problem-solving in engineering, not management of business data. (Recently, the use of similar languages in non-engineering applications like search is making some inroads, but Java, C#, and so on are still struggling with their non-businessy roots.) In my opinion, it's not fair to criticize SQL for being different from "most programming languages." SQL was designed to support a framework for business data modeling that's very different from engineering, so the language is different (and better for its goal). Heck, when SQL was first specified, some languages didn't have any built-in string type. And in some languages still, the equals operator between strings doesn't compare character data at all, but compares references! It wouldn't surprise me if in another decade or two, the idea that == is culture-dependent becomes the norm.

SQL Server 2008 Empty String vs. Space

Tags:

sql-server

tsql

sql-server-2008

string-length

datalength

I ran into something a little odd this morning and thought I'd submit it for commentary.

Can someone explain why the following SQL query prints 'equal' when run against SQL 2008. The db compatibility level is set to 100.

if '' = ' '     print 'equal' else     print 'not equal'

And this returns 0:

select (LEN(' '))

It appears to be auto trimming the space. I have no idea if this was the case in previous versions of SQL Server, and I no longer have any around to even test it.

I ran into this because a production query was returning incorrect results. I cannot find this behavior documented anywhere.

Does anyone have any information on this?

594

asked Sep 09 '09 13:09

jhale

2 Answers

varchars and equality are thorny in TSQL. The LEN function says:

Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

You need to use DATALENGTH to get a true byte count of the data in question. If you have unicode data, note that the value you get in this situation will not be the same as the length of the text.

print(DATALENGTH(' ')) --1 print(LEN(' '))        --0

When it comes to equality of expressions, the two strings are compared for equality like this:

Get Shorter string
Pad with blanks until length equals that of longer string
Compare the two

It's the middle step that is causing unexpected results - after that step, you are effectively comparing whitespace against whitespace - hence they are seen to be equal.

LIKE behaves better than = in the "blanks" situation because it doesn't perform blank-padding on the pattern you were trying to match:

if '' = ' ' print 'eq' else print 'ne'

Will give eq while:

if '' LIKE ' ' print 'eq' else print 'ne'

Will give ne

Careful with LIKE though: it is not symmetrical: it treats trailing whitespace as significant in the pattern (RHS) but not the match expression (LHS). The following is taken from here:

declare @Space nvarchar(10) declare @Space2 nvarchar(10)  set @Space = '' set @Space2 = ' '  if @Space like @Space2 print '@Space Like @Space2' else print '@Space Not Like @Space2'  if @Space2 like @Space print '@Space2 Like @Space' else print '@Space2 Not Like @Space'  @Space Not Like @Space2 @Space2 Like @Space

162

answered Sep 28 '22 21:09

butterchicken

The = operator is T-SQL is not so much "equals" as it is "are the same word/phrase, according to the collation of the expression's context," and LEN is "the number of characters in the word/phrase." No collations treat trailing blanks as part of the word/phrase preceding them (though they do treat leading blanks as part of the string they precede).

If you need to distinguish 'this' from 'this ', you shouldn't use the "are the same word or phrase" operator because 'this' and 'this ' are the same word.

Contributing to the way = works is the idea that the string-equality operator should depend on its arguments' contents and on the collation context of the expression, but it shouldn't depend on the types of the arguments, if they are both string types.

The natural language concept of "these are the same word" isn't typically precise enough to be able to be captured by a mathematical operator like =, and there's no concept of string type in natural language. Context (i.e., collation) matters (and exists in natural language) and is part of the story, and additional properties (some that seem quirky) are part of the definition of = in order to make it well-defined in the unnatural world of data.

On the type issue, you wouldn't want words to change when they are stored in different string types. For example, the types VARCHAR(10), CHAR(10), and CHAR(3) can all hold representations of the word 'cat', and ? = 'cat' should let us decide if a value of any of these types holds the word 'cat' (with issues of case and accent determined by the collation).

Response to JohnFx's comment:

See Using char and varchar Data in Books Online. Quoting from that page, emphasis mine:

Each char and varchar data value has a collation. Collations define attributes such as the bit patterns used to represent each character, comparison rules, and sensitivity to case or accenting.

I agree it could be easier to find, but it's documented.

Worth noting, too, is that SQL's semantics, where = has to do with the real-world data and the context of the comparison (as opposed to something about bits stored on the computer) has been part of SQL for a long time. The premise of RDBMSs and SQL is the faithful representation of real-world data, hence its support for collations many years before similar ideas (such as CultureInfo) entered the realm of Algol-like languages. The premise of those languages (at least until very recently) was problem-solving in engineering, not management of business data. (Recently, the use of similar languages in non-engineering applications like search is making some inroads, but Java, C#, and so on are still struggling with their non-businessy roots.)

In my opinion, it's not fair to criticize SQL for being different from "most programming languages." SQL was designed to support a framework for business data modeling that's very different from engineering, so the language is different (and better for its goal).

Heck, when SQL was first specified, some languages didn't have any built-in string type. And in some languages still, the equals operator between strings doesn't compare character data at all, but compares references! It wouldn't surprise me if in another decade or two, the idea that == is culture-dependent becomes the norm.

answered Sep 28 '22 19:09

Steve Kass

Related questions
                            
                                Errors in SQL Server while importing CSV file despite varchar(MAX) being used for each column
                            
                                Disable Transaction Log
                            
                                SQL NOT IN not working
                            
                                SQL Network Interfaces, error: 50 - Local Database Runtime error occurred. Cannot create an automatic instance
                            
                                exec failed because the name not a valid identifier?
                            
                                Cannot login after creating the user in SQL Server
                            
                                mssql '5 (Access is denied.)' error during restoring database
                            
                                Why is 199.96 - 0 = 200 in SQL?
                            
                                Tinyint vs Bit?
                            
                                Escape a string in SQL Server so that it is safe to use in LIKE expression
                            
                                Inserting rows into a table with one IDENTITY column only [duplicate]
                            
                                What should be the best way to store a percent value in SQL-Server?
                            
                                Get Multiple Values in SQL Server Cursor
                            
                                Transfer data from one database to another database
                            
                                Use of SqlParameter in SQL LIKE clause not working
                            
                                How to create jobs in SQL Server Express edition
                            
                                Delete statement in SQL is very slow
                            
                                SELECT FOR UPDATE with SQL Server
                            
                                Command for adding a default constraint
                            
                                Cannot use UPDATE with OUTPUT clause when a trigger is on the table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With