Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Match Special Characters Regexp

I'm looking for an SQL statement that will return only rows of my table whose Name field contains special characters (excluding underscores).

I've tried:

SELECT * FROM 'table' WHERE Name REGEXP '^[!#$%&()*+,\-./:;<=>?@[\\\]^`{|}~]+$'

But no dice, this returns an empty result set (despite there being rows I specifically added with Name fields containing %, $, and # characters).

like image 451
saricden Avatar asked Jan 30 '13 13:01

saricden


People also ask

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).

How do I find the special characters in a string in SQL?

How do I find a specific character in SQL? SQL Server CHARINDEX() Function The CHARINDEX() function searches for a substring in a string, and returns the position. If the substring is not found, this function returns 0.


2 Answers

The first problem seems to be is the ^ and $ signs (Mike C summarized it quicker than I did why...)

But I see escaping problems too: all special characters that mean something in regexp should be escaped specially placed in the [], so [, ], ^, -

Here is a question about how to escape special characters inside character groups in MySQL regexes.

Conclusion detailed in the regex documentation:

A bracket expression is a list of characters enclosed in '[]'. It normally matches any single character from the list (but see below).

  • If the list begins with '^', it matches any single character (but see below) not from the rest of the list.

  • If two characters in the list are separated by '-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. '[0-9]' in ASCII matches any decimal digit.

  • It is illegal(!) for two ranges to share an endpoint, e.g. 'a-c-e'. Ranges are very collating sequence-dependent, and portable programs should avoid relying on them.

  • To include a literal ']' in the list, make it the first character (following a possible '^').

  • To include a literal '-', make it the first orlast character, or the second endpoint of a range.

  • To use a literal '-' as the first endpoint of a range, enclose it in '[.' and '.]' to make it a collating element (see below).

With the exception of these and some combinations using '[' (see next paragraphs), all other special characters, including '\', lose their special significance within a bracket expression.

EDIT Here is an SQL fiddle about some interesting regexes regarding the ] character

DDL: create table txt ( txt varchar(200) );

insert into txt values ('ab[]cde');
insert into txt values ('ab[cde');
insert into txt values ('ab]cde');
insert into txt values ('ab[]]]]cde');
insert into txt values ('ab[[[[]cde');
insert into txt values ('ab\\]]]]cde');
insert into txt values ('ab[wut?wut?]cde');

Queries:

Naive approach to match a group of [ and ] chars. Syntactically OK, but the group is the single [ char, and it matches multiple ] chars afterwards.

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[]]+cde';

Escaped -> same ???

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[\]]+cde';

Double escape -> doesn't work, group is now a [ and a \

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[\\]]+cde';

Swapping the closing bracket with the opening one inside the group. This is the weirdest regex I ever wrote - to this point...

SELECT * FROM txt WHERE txt 
REGEXP 'ab[][]+cde';

I will get killed by such a (totally valid!) regex in a weird nightmare, I think:

SELECT * FROM txt WHERE txt 
REGEXP 'ab[]wut?[]+cde';
like image 145
ppeterka Avatar answered Sep 27 '22 00:09

ppeterka


This regex should match names that ONLY contain special characters. You specify the carat (^) which signifies the start of the string, your character class with your list of special characters, the plus sign (+) to indicate one or more, and then the dollar to signify the end of the string. You need to account for non-special character in the string. You could try something like this:

WHERE Name REGEXP '^.*?[!#$%&()*+,\-./:;<=>?@[\\\]^`{|}~]+.*?$'

I added the .*? at the beginning and end to allow for non-special characters before and after the special character. BTW, you probably don't need the (+) any more, since one special would be enough for a match.

like image 34
Mike C Avatar answered Sep 27 '22 00:09

Mike C