I'm trying to locate some bad data that has been inserted into a table. Likely by someone doing a copy/paste from Word then inserting into the database.
I have seen the similar questions like Query for finding rows with special characters
but this doesn't quite work for what I'm needing. Essentially I want to only return back a data set not including any standard characters and catch things such as an endash (just one example).
I have tried using something like this
SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z \-@\.]%'
but it returns back every single single record.
In case it is of benefit for anyone else that comes along later. Ultimately the issue I was having was due to the placement of the hyphen (-) character as was also noted by sgmoore below. I moved this to the beginning of my range just following the not (^) character.
Also, based on the info provided by gbn that LIKE is not actually using regexes I revisited the Microsoft documentation here SQL Server LIKE Statement. I was using the backslash character unnecessarily as an escape character due to my assumption it was the same as a regex. These were unnecessary, and apparently escape characters are only needed with wildcard characters. The doc I linked also mentions using an ESCAPE clause following the LIKE range to specify what character is to be used as an escape character e.g. WHERE percent_complete LIKE '%50!%' ESCAPE '!' would match a string that actually ends in 50% (50%, 150%).
Here is what I ended up using to screen my email data for bad characters; for me it works, but it may not be complete for all cases.
SELECT * FROM mytable WHERE email LIKE '%[^-0-9a-zA-Z_@.]%'
also if it is helpful, I needed to do something similar on a couple of other generic text fields; this far from comprehensive, but it narrowed my result set down to just a handful of records that I was then able to visually determine what I was looking for.
SELECT * from mytable WHERE text_field LIKE '%[^-0-9a-zA-Z @.''?:/,+&();_]%'
Try
SELECT * FROM mytable WHERE email LIKE '%[^0-9a-zA-Z @\.\-]%'
It would look like the position of the - sign on your version is causing problems.
Use double negatives
... WHERE email NOT LIKE '%[^0-9a-zA-Z ,-@\.]%'
Sample data would be useful too
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With