I have the defined a function dbo.ufnRemoveSpaceCharacters
to remove all characters under the Unicode category Zs (Space Characters). All 17 of the characters able to be replaced/translated using REPLACE
and TRANSLATE
when they are the only character in the string (NCHAR(1)
or NVARCHAR(1)
), but when the characters are combined into a string, such as a a
(where the space/dash is the Ogham Space Mark 0x1680
), the character is NOT replaced. The following characters are not replaced/translated Ogham Space Mark 0x1680
, Narrow No Break Space 0x202F
, Medium Mathematical Space 0x205F
.
DROP FUNCTION IF EXISTS dbo.ufnRemoveSpaceCharacters
GO
CREATE FUNCTION dbo.ufnRemoveSpaceCharacters(@Value NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @Space NCHAR = NCHAR(0x0020);
DECLARE @NoBreakSpace NCHAR = NCHAR(0x00A0);
DECLARE @OghamSpaceMark NCHAR = NCHAR(0x1680);
DECLARE @EnQuad NCHAR = NCHAR(0x2000);
DECLARE @EmQuad NCHAR = NCHAR(0x2001);
DECLARE @EnSpace NCHAR = NCHAR(0x2002);
DECLARE @EmSpace NCHAR = NCHAR(0x2003);
DECLARE @ThreePerEmSpace NCHAR = NCHAR(0x2004);
DECLARE @FourPerEmSpace NCHAR = NCHAR(0x2005);
DECLARE @SixPerEmSpace NCHAR = NCHAR(0x2006);
DECLARE @FigureSpace NCHAR = NCHAR(0x2007);
DECLARE @PunctuationSpace NCHAR = NCHAR(0x2008);
DECLARE @ThinSpace NCHAR = NCHAR(0x2009);
DECLARE @HairSpace NCHAR = NCHAR(0x200A);
DECLARE @NarrowNoBreakSpace NCHAR = NCHAR(0x202F);
DECLARE @MediumMathematicalSpace NCHAR = NCHAR(0x205F);
DECLARE @IdeographicSpace NCHAR = NCHAR(0x3000);
DECLARE @SpaceCharacters NVARCHAR(17) =
@Space + @NoBreakSpace + @OghamSpaceMark + @EnQuad
+ @EmQuad + @EnSpace + @EmSpace + @ThreePerEmSpace
+ @FourPerEmSpace + @SixPerEmSpace + @FigureSpace
+ @PunctuationSpace + @ThinSpace + @HairSpace
+ @NarrowNoBreakSpace + @MediumMathematicalSpace + @IdeographicSpace;
DECLARE @ReplacePhrase NVARCHAR(17) = N'aaaaaaaaaaaaaaaaa';
RETURN REPLACE(TRANSLATE(@Value, @SpaceCharacters, @ReplacePhrase), @Space, '')
END
GO
You can reproduce this by using the following
DECLARE @Space NCHAR = NCHAR(0x0020);
DECLARE @NoBreakSpace NCHAR = NCHAR(0x00A0);
DECLARE @OghamSpaceMark NCHAR = NCHAR(0x1680);
DECLARE @EnQuad NCHAR = NCHAR(0x2000);
DECLARE @EmQuad NCHAR = NCHAR(0x2001);
DECLARE @EnSpace NCHAR = NCHAR(0x2002);
DECLARE @EmSpace NCHAR = NCHAR(0x2003);
DECLARE @ThreePerEmSpace NCHAR = NCHAR(0x2004);
DECLARE @FourPerEmSpace NCHAR = NCHAR(0x2005);
DECLARE @SixPerEmSpace NCHAR = NCHAR(0x2006);
DECLARE @FigureSpace NCHAR = NCHAR(0x2007);
DECLARE @PunctuationSpace NCHAR = NCHAR(0x2008);
DECLARE @ThinSpace NCHAR = NCHAR(0x2009);
DECLARE @HairSpace NCHAR = NCHAR(0x200A);
DECLARE @NarrowNoBreakSpace NCHAR = NCHAR(0x202F);
DECLARE @MediumMathematicalSpace NCHAR = NCHAR(0x205F);
DECLARE @IdeographicSpace NCHAR = NCHAR(0x3000);
DECLARE @SpaceCharacters NVARCHAR(17) =
@Space + @NoBreakSpace + @OghamSpaceMark + @EnQuad
+ @EmQuad + @EnSpace + @EmSpace + @ThreePerEmSpace
+ @FourPerEmSpace + @SixPerEmSpace + @FigureSpace
+ @PunctuationSpace + @ThinSpace + @HairSpace
+ @NarrowNoBreakSpace + @MediumMathematicalSpace + @IdeographicSpace;
SELECT REPLACE(@OghamSpaceMark, @OghamSpaceMark, 'a'),
TRANSLATE(@OghamSpaceMark, @OghamSpaceMark, 'a'),
REPLACE(@Space + @OghamSpaceMark, @OghamSpaceMark, 'a'),
TRANSLATE(@Space + @OghamSpaceMark + @Space, @OghamSpaceMark, 'a'),
dbo.ufnRemoveSpaceCharacters(@SpaceCharacters),
REPLACE(@SpaceCharacters, @OghamSpaceMark, 'a')
with output
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
---|---|---|---|---|---|
a | a | a | aa aaaaaaaaaaa a |
I also ran the following to provide additional info
SELECT os_language_version,
SERVERPROPERTY('LCID') AS 'Instance-LCID',
SERVERPROPERTY('Collation') AS 'Instance-Collation',
SERVERPROPERTY('ComparisonStyle') AS 'Instance-ComparisonStyle',
SERVERPROPERTY('SqlSortOrder') AS 'Instance-SqlSortOrder',
SERVERPROPERTY('SqlSortOrderName') AS 'Instance-SqlSortOrderName',
SERVERPROPERTY('SqlCharSet') AS 'Instance-SqlCharSet',
SERVERPROPERTY('SqlCharSetName') AS 'Instance-SqlCharSetName',
DATABASEPROPERTYEX(N'{database_name}', 'LCID') AS 'Database-LCID',
DATABASEPROPERTYEX(N'{database_name}', 'Collation') AS 'Database-Collation',
DATABASEPROPERTYEX(N'{database_name}', 'ComparisonStyle') AS 'Database-ComparisonStyle',
DATABASEPROPERTYEX(N'{database_name}', 'SQLSortOrder') AS 'Database-SQLSortOrder'
FROM sys.dm_os_windows_info;
with output
os_language_version | Instance-LCID | Instance-Collation | Instance-ComparisonStyle | Instance-SqlSortOrder | Instance-SqlSortOrderName | Instance-SqlCharSet | Instance-SqlCharSetName | Database-LCID | Database-Collation | Database-ComparisonStyle | Database-SQLSortOrder |
---|---|---|---|---|---|---|---|---|---|---|---|
1033 | 1033 | SQL_Latin1_General_CP1_CI_AS | 196609 | 52 | nocase_iso | 1 | iso_1 | NULL | NULL | NULL | NULL |
@JeroenMostert Answered this with:
In the last line of your function, add COLLATE Latin1_General_BIN2 after @Value. Welcome to the wonderful wacky world of how collations treat character clusters; fortunately the binary collations just do as they're told.
(Unfortunately it's not permitted to mark parameters or output types with a COLLATE clause, as this would arguably be much easier than having to figure out exactly where it has to go in every other spot.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With