Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server 2008, searching for special characters

I'm trying to locate some bad data that has been inserted into a table. Likely by someone doing a copy/paste from Word then inserting into the database.

I have seen the similar questions like Query for finding rows with special characters

but this doesn't quite work for what I'm needing. Essentially I want to only return back a data set not including any standard characters and catch things such as an endash (just one example).

I have tried using something like this

SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z \-@\.]%'

but it returns back every single single record.


EDIT

In case it is of benefit for anyone else that comes along later. Ultimately the issue I was having was due to the placement of the hyphen (-) character as was also noted by sgmoore below. I moved this to the beginning of my range just following the not (^) character.

Also, based on the info provided by gbn that LIKE is not actually using regexes I revisited the Microsoft documentation here SQL Server LIKE Statement. I was using the backslash character unnecessarily as an escape character due to my assumption it was the same as a regex. These were unnecessary, and apparently escape characters are only needed with wildcard characters. The doc I linked also mentions using an ESCAPE clause following the LIKE range to specify what character is to be used as an escape character e.g. WHERE percent_complete LIKE '%50!%' ESCAPE '!' would match a string that actually ends in 50% (50%, 150%).

Here is what I ended up using to screen my email data for bad characters; for me it works, but it may not be complete for all cases.

SELECT * FROM mytable WHERE email LIKE '%[^-0-9a-zA-Z_@.]%'

also if it is helpful, I needed to do something similar on a couple of other generic text fields; this far from comprehensive, but it narrowed my result set down to just a handful of records that I was then able to visually determine what I was looking for.

SELECT * from mytable WHERE text_field LIKE '%[^-0-9a-zA-Z @.''?:/,+&();_]%'

like image 954
bigtunacan Avatar asked Apr 03 '13 15:04

bigtunacan


2 Answers

Try

SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z @\.\-]%'

It would look like the position of the - sign on your version is causing problems.

like image 183
sgmoore Avatar answered Oct 15 '22 17:10

sgmoore


Use double negatives

... WHERE email NOT LIKE '%[^0-9a-zA-Z ,-@\.]%'

Sample data would be useful too

like image 20
gbn Avatar answered Oct 15 '22 19:10

gbn