Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace Unicode characters in T-SQL

How do I replace only the last character of the string:

select REPLACE('this is the news with a þ', 'þ', '__')

The result I'm getting is:

__is is __e news wi__ a __

EDIT The collation of the server and the database is Latin1_General_CI_AS

The actual query I'm running is REPLACE(note, 'þ', '') where note is an ntext column. The point is to strip out the thorn characters because that character gets used later in the process as a column delimiter. (Please don't suggest changing the delimiter, that's just not going to happen given the extent to which it's been used!)

I've tried using the N prefix even using the test select statement, here are the results:

Still broken!

like image 674
Sean Avatar asked Mar 12 '15 14:03

Sean


People also ask

How do I replace multiple characters in a string in SQL?

If you wanted to replace the words with blank string, go with REGEXP_REPLACE() . If you want to replace the words with other words, for example replacing & with and then use replace() . If there are multiple words to be replaced, use multiple nested replace() .


1 Answers

The þ character (Extended ASCII { via ISO-8859-1 and ANSI Code Page 1252 } & UNICODE value of 254) is known as "thorn" and in some languages equates directly to th:

  • Technical info on the character here: http://unicode-table.com/en/00FE/

  • Explanation of that character and collations here: http://userguide.icu-project.org/collation/customization. Search the page — typically Control-F — for "Complex Tailoring Examples" and you will see the following:

    The letter 'þ' (THORN) is normally treated by UCA/root collation as a separate letter that has primary-level sorting after 'z'. However, in Swedish and some other Scandinavian languages, 'þ' and 'Þ' should be treated as just a tertiary-level difference from the letters "th" and "TH" respectively.

If you do not want þ to equate to th, then force a Binary collation as follows:

SELECT REPLACE(N'this is the news with a þ' COLLATE Latin1_General_100_BIN2,
                 N'þ', N'__');

Returns:

this is the news with a __

For more info on working with Collations, Unicode, encodings, etc, please visit: Collations Info

like image 195
Solomon Rutzky Avatar answered Sep 18 '22 23:09

Solomon Rutzky