Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing SQL Server number literal with underscore

I wonder why it works and why it does not return an error:

SELECT 2015_11

Result:

╔══════╗
║ _11  ║
╠══════╣
║ 2015 ║
╚══════╝

Second case:

SELECT 2.1_a

╔═════╗
║ _a  ║
╠═════╣
║ 2.1 ║
╚═════╝

Checking metadata:

SELECT  name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
N'SELECT 2015_11', NULL, 0) 
UNION ALL
SELECT  name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
N'SELECT 3.2_a', NULL, 0) 

╔══════╦══════════════════╗
║ name ║ system_type_name ║
╠══════╬══════════════════╣
║ _11  ║ int              ║
║ _a   ║ numeric(2,1)     ║
╚══════╩══════════════════╝

While identifier that starts with letter behaves as I think it should:

SELECT a_11
-- Invalid column name 'a_11'.

LiveDemo

like image 474
Lukasz Szozda Avatar asked Mar 14 '16 12:03

Lukasz Szozda


People also ask

Can you use _ in SQL?

To broaden the selections of a structured query language (SQL-SELECT) statement, two wildcard characters, the percent sign (%) and the underscore (_), can be used.

What does _ represent in SQL?

The SQL LIKE Operator The LIKE operator is used in a WHERE clause to search for a specified pattern in a column. There are two wildcards often used in conjunction with the LIKE operator: The percent sign (%) represents zero, one, or multiple characters. The underscore sign (_) represents one, single character.

Is underscore a special character in SQL?

SQL Server T-SQL Wildcard Characters Unlike literal characters, wildcard characters have specific meaning for the LIKE operator. Hence, underscore ('_') in LIKE does not mean a specific regular character, but any single character.

How do I escape a special character in a string in SQL?

Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence. When you use braces to escape a single character, the escaped character becomes a separate token in the query. Use the backslash character to escape a single character or symbol.


2 Answers

SQL treats query like

SELECT 2015_11

as

SELECT 2015 _11 

which is shortcut for

SELECT 2015 AS [_11]

SQL Server expects column names to follow some naming convention rules as detailed in this MSDN link

The names of variables, functions, and stored procedures must comply with the following rules for Transact-SQL identifiers. The first character must be one of the following:

  1. A letter as defined by the Unicode Standard 3.2. The Unicode definition of letters includes Latin characters from a through z, from A through Z, and also letter characters from other languages.
  2. The underscore (_), at sign (@), or number sign (#).

    Certain symbols at the beginning of an identifier have special meaning in SQL Server. A regular identifier that starts with the at sign always denotes a local variable or parameter and cannot be used as the name of any other type of object. An identifier that starts with a number sign denotes a temporary table or procedure. An identifier that starts with double number signs (##) denotes a global temporary object. Although the number sign or double number sign characters can be used to begin the names of other types of objects, we do not recommend this practice.

Some Transact-SQL functions have names that start with double at signs (@@). To avoid confusion with these functions, you should not use names that start with @@.

Also the syntax for SELECT as per MSDN is like

SELECT [ ALL | DISTINCT ] [ TOP ( expression ) [ PERCENT ] [ WITH TIES ] ] ::= { * | { table_name | view_name | table_alias }.* | { [ { table_name | view_name | table_alias }. ] { column_name | $IDENTITY | $ROWGUID } | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ] | expression [ [ AS ] column_alias ] } | column_alias = expression } [ ,...n ]

In this case the SQL parser first checks for table name, and then column name, Identity and rowguid, and so on till it hits the match with

| expression [ [ AS ] column_alias ]

It then reads literal value till underscore character which is when it realizes that literal must have ended and starts parsing the later characters as Column_alias without as explicit AS

To verify this try following code in SQL server

SELECT 2015AS _11

This will produce same results as

SELECT 2015_11

Also to verify what I just wrote above see the screenshot from SSMS which does a code highlighting on AS

enter image description here

In your first example 2015 is integer literal and in second example 2.1 is decimal literal

In your third example a is not a valid literal. If you try

SELECT 'a'_8

This will give you result like

╔═════╗
║ _8  ║
╠═════╣
║ a   ║
╚═════╝

PS: You'll see that this works pretty much the same way with # as well

So SELECT 2015#11 will give similar results

╔══════╗
║ #11  ║
╠══════╣
║ 2015 ║
╚══════╝
like image 93
DhruvJoshi Avatar answered Oct 16 '22 07:10

DhruvJoshi


To understand what is happening, you need to understand what SQL Server accepts as identifiers. There are a lot of rules, which are documented here. But, the important one is:

The first character must be one of the following:

  1. A letter as defined by the Unicode Standard 3.2. The Unicode definition of letters includes Latin characters from a through z, from A through Z, and also letter characters from other languages.

  2. The underscore (_), at sign (@), or number sign (#).

The important point is that when the SQL Server parser encounters a digit, it says to itself: "This is a number". When it hits the underscore, it says "Well, no more number, must be starting something else". The parser recognizes the second component as a valid identifier, so this is treated as:

select 2015 _11

which is for a column alias, even without the as.

like image 20
Gordon Linoff Avatar answered Oct 16 '22 08:10

Gordon Linoff