Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behaviour of SQL order by when the string consists space

SQL Example 1:

SELECT TestField 
FROM (VALUES('Ne'), ('NE')) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

Result 1:

Ne  
NE

SQL Example 2 (between NE and a there are 2 spaces, while between Ne and a there is only 1):

SELECT TestField 
FROM (VALUES('Ne a'), ('NE  a')) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

Result 2:

NE  a  
Ne a

Can someone explain it?

Thanks

like image 907
Zoltán Hartai Avatar asked Sep 29 '22 13:09

Zoltán Hartai


2 Answers

Second query:

SELECT TestField 
FROM (VALUES
    ('Ne a'), 
    ('NE  a')
--    12345
) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

For case sensitive collations, differences generated by alphabetic order (position 4: < a) is more important than differences generated by case order (position 2: e < E). Thus NE a is before Ne a .

Another example: difference between and a (position 2) is more important than case order (position 1: e vs E):

SELECT '{' + TestField  + '}'
FROM (VALUES
    ('ea'), 
    ('E ') -- or ('E')
--    12
) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS
/*
TestField
---------
{E }
{ea}
*/

More details on Rusanu's blog.

Update #1: You could use SQL_EBCDIC037_CP1_CS_AS collation:

SELECT TestField 
FROM (VALUES
    ('Ne a'), 
    ('NE  a')
--    12345
) AS TestTable(TestField)
ORDER BY TestField COLLATE SQL_EBCDIC037_CP1_CS_AS
/*
TestField
---------
Ne a
NE  a
*/

but this collation could generate some strange results.

Example:

SELECT  x.ColA AS ColA_Latin1_General_CS_AS
FROM (
    SELECT  'A'
    UNION ALL 
    SELECT  'AB'
    UNION ALL 
    SELECT  'ABC'
    UNION ALL   
    SELECT  'zzzz'
) x(ColA)
ORDER BY x.ColA COLLATE Latin1_General_CS_AS
/*
ColA_Latin1_General_CS_AS
----------------------------
A
AB
ABC
zzzz
*/

vs.

SELECT  x.ColA AS ColA_SQL_EBCDIC037_CP1_CS_AS
FROM (
    SELECT  'A'
    UNION ALL 
    SELECT  'AB'
    UNION ALL 
    SELECT  'ABC'
    UNION ALL   
    SELECT  'zzzz'
) x(ColA)
ORDER BY x.ColA COLLATE SQL_EBCDIC037_CP1_CS_AS
/*
ColA_SQL_EBCDIC037_CP1_CS_AS
----------------------------
zzzz
A
AB
ABC
*/

Note: I never used SQL_EBCDIC037_CP1_CS_AS collation and I do not recommend.

Update #2: Text values are splited in two (or more) columns

-- Scenario #1: before/during insert/update, spaces are trimmed with LTRIM
SELECT TestField1 F1,  TestFiel2 AS F2
FROM (VALUES
    ('JOHN', 'ZOE'),  
    ('JOHN', 'Albano')
) AS TestTable(TestField1, TestFiel2)
ORDER BY TestField1 COLLATE Latin1_General_CS_AS, TestFiel2 COLLATE Latin1_General_CS_AS
/*
F1   F2
---- ------
JOHN Albano
JOHN ZOE
*/
-- Scenario #2: during insert/update spaces are not trimmed (with LTRIM)
SELECT LTRIM(TestField1) COLLATE Latin1_General_CS_AS AS F1,  LTRIM(TestFiel2) COLLATE Latin1_General_CS_AS AS F2
FROM (VALUES
    ('JOHN', ' ZOE'),  -- 1 extra space 
    ('JOHN', 'Albano')
) AS TestTable(TestField1, TestFiel2)
ORDER BY F1, F2 
/*
F1   F2
---- ------
JOHN Albano
JOHN ZOE
*/

Note: I would use solution described in Scenario #1.

like image 169
Bogdan Sahlean Avatar answered Oct 02 '22 04:10

Bogdan Sahlean


Your second sample has two spaces in the value NE a - therefore, this will be ordered before the Ne a value with just a single space (because the (second) space comes before the a value).

If you reduce the second value to also include just a single space, you'll get the same ordering as in smaple #1:

SELECT TestField 
FROM (VALUES('Ne a'), ('NE a')) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

Output:

Ne a
NE a
like image 36
marc_s Avatar answered Oct 02 '22 04:10

marc_s