Find non-ASCII characters in varchar columns using SQL Server

Tags:

How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.

I am doing something like this now, but it is not working

select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'

For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns:

The identity field for that record. (This will allow the whole record to be reviewed with another query.)
The column name
The text with the invalid character

 Id | FieldName | InvalidText       |
----+-----------+-------------------+
 25 | LastName  | Solís             |
 56 | FirstName | François          |
100 | Address1  | 123 Ümlaut street |

Invalid characters would be any outside the range of SPACE (32₁₀) through ~ (127₁₀)

949

asked Oct 08 '10 14:10

Gerhard Weiss

2 Answers

Here is a solution for the single column search using PATINDEX.
It also displays the StartPosition, InvalidCharacter and ASCII code.

select line,
  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) as [Position],
  substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1) as [InvalidCharacter],
  ascii(substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1)) as [ASCIICode]
from  staging.APARMRE1
where patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) >0

answered Oct 19 '22 17:10

Gerhard Weiss

I've been running this bit of code with success

declare @UnicodeData table (
     data nvarchar(500)
)
insert into 
    @UnicodeData
values 
    (N'Horse�')
    ,(N'Dog')
    ,(N'Cat')

select
    data
from
    @UnicodeData 
where
    data collate LATIN1_GENERAL_BIN != cast(data as varchar(max))

Which works well for known columns.

For extra credit, I wrote this quick script to search all nvarchar columns in a given table for Unicode characters.

declare 
    @sql    varchar(max)    = ''
    ,@table sysname         = 'mytable' -- enter your table here

;with ColumnData as (
    select
        RowId               = row_number() over (order by c.COLUMN_NAME)
        ,c.COLUMN_NAME
        ,ColumnName         = '[' + c.COLUMN_NAME + ']'
        ,TableName          = '[' + c.TABLE_SCHEMA + '].[' + c.TABLE_NAME + ']' 
    from
        INFORMATION_SCHEMA.COLUMNS c
    where
        c.DATA_TYPE         = 'nvarchar'
        and c.TABLE_NAME    = @table
)
select
    @sql = @sql + 'select FieldName = ''' + c.ColumnName + ''',         InvalidCharacter = [' + c.COLUMN_NAME + ']  from ' + c.TableName + ' where ' + c.ColumnName + ' collate LATIN1_GENERAL_BIN != cast(' + c.ColumnName + ' as varchar(max)) '  +  case when c.RowId <> (select max(RowId) from ColumnData) then  ' union all ' else '' end + char(13)
from
    ColumnData c

-- check
-- print @sql
exec (@sql)

I'm not a fan of dynamic SQL but it does have its uses for exploratory queries like this.

answered Oct 19 '22 17:10

Vash

Related questions
                            
                                how to know status of currently running jobs
                            
                                SQL Server IF EXISTS THEN 1 ELSE 2
                            
                                How do I exclude Weekend days in a SQL Server query?
                            
                                SQL Server - Running large script files
                            
                                SQL Server remove milliseconds from datetime
                            
                                Cannot find the object because it does not exist or you do not have permissions. Error in SQL Server
                            
                                EF4 - The selected stored procedure returns no columns
                            
                                How to Create a real one-to-one relationship in SQL Server
                            
                                What is the best way to implement Polymorphic Association in SQL Server?
                            
                                Online SQL syntax checker conforming to multiple databases [closed]
                            
                                Stored procedures/DB schema in source control
                            
                                What does a Status of "Suspended" and high DiskIO means from sp_who2?
                            
                                Remove certain characters from a string
                            
                                How do you UNION with multiple CTEs?
                            
                                TransactSQL to run another TransactSQL script
                            
                                Incorrect syntax near 'GO'
                            
                                Most recent record in a left join
                            
                                Compare dates in T-SQL, ignoring the time part
                            
                                T-SQL Throw Exception
                            
                                SQL Server Group by Count of DateTime Per Hour?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find non-ASCII characters in varchar columns using SQL Server

Tags:

sql-server

tsql

sql-server-2005

non-ascii-characters

Gerhard Weiss

People also ask

2 Answers

Gerhard Weiss

Vash

Recent Activity

Donate For Us