Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Range wildcard pattern matching behaviour with case-sensitive collations

Using PATINDEX and a case-sensitive collation to search for upper-case letters in a string I noticed this was not yielding the desired result:

-- returns 1
SELECT PATINDEX('%[A-Z]%'
                    , 'abCde' COLLATE SQL_Latin1_General_Cp1_CS_AS); 

however, specifying every letter, A-Z, does:

-- returns 3
SELECT PATINDEX('%[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%'
                    , 'abCde' COLLATE SQL_Latin1_General_Cp1_CS_AS); 

Is my understanding of using a range in the first case incorrect? Why is the behaviour like this?

like image 895
Pero P. Avatar asked Nov 18 '10 06:11

Pero P.


People also ask

Are wildcards case sensitive SQL?

In conclusion it is possible to do wildcard case in-sensitive search on oracle using EQUALS_IGNORE_CASE comparisonOperator. For MS SQL Server the default behavior is that all string comparisons are case insensitive so there is no issue.

Is SQL pattern matching case sensitive?

SQL pattern matching enables you to use _ to match any single character and % to match an arbitrary number of characters (including zero characters). In MySQL, SQL patterns are case-insensitive by default.

Is Patindex case sensitive?

The PATINDEX() function returns the position of a pattern in a string. If the pattern is not found, this function returns 0. Note: The search is case-insensitive and the first position in string is 1.

How do you do case-insensitive comparison in SQL?

To do a case-insensitive comparison, use the ILIKE keyword; e.g., column ILIKE 'aBc' and column ILIKE 'ABC' both return TRUE for 'abc' . In contrast, MySQL and MS SQL Server have case-insensitive behaviors by default. This means WHERE column = 'abc' returns TRUE for e.g., 'abc' , 'ABC' , or 'aBc' .


1 Answers

Unfortunately, the range operators are a bit funny. The range of letters from A-Z is:

AbBcCdDeE...yYzZ

That is, lower case characters immediately precede their upper case counterpart. This is also fun because if you want to deal with both upper and lower case characters, in a case sensitive collation, the range A-Z excludes lower case a.


I should say the above, regarding how the range expands out, is based on the collations I generally work with. How the range actually expands is collation dependent. If you can find a collation where, for instance, all upper case characters occur before all lower case characters, then the range would work as you expect. (Possibly one of the binary collations?)

like image 170
Damien_The_Unbeliever Avatar answered Oct 12 '22 23:10

Damien_The_Unbeliever