Some weird characters are getting stored in one of the table. They seem to be coming from .csv
feeds so I don't have much control over that.
Hello Kitty Essential Accessory Kit
How can I clean it and remove these characters. I am ok doing it at db level or in C#.
EDIT
As per the suggestions received in comments. I am also looking into what I can do to correct it at feed level. Here's more info on it.
You can use .net
regular expression functions. For example, using Regex.Replace
:
Regex.Replace(s, @"[^\u0000-\u007F]", string.Empty);
As there is no support for regular expressions in SQL Server
you need to create a SQL CLR
function. More information about the .net
integration in SQL Server
can be found here:
In your case:
Open Visual Studio
and create Class Library Project
:
Then rename the class to StackOverflow
and paste the following code in its file:
using Microsoft.SqlServer.Server;
using System;
using System.Collections.Generic;
using System.Data.SqlTypes;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
public class StackOverflow
{
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, Name = "RegexReplace")]
public static SqlString Replace(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement)
{
string input = (sqlInput.IsNull) ? string.Empty : sqlInput.Value;
string pattern = (sqlPattern.IsNull) ? string.Empty : sqlPattern.Value;
string replacement = (sqlReplacement.IsNull) ? string.Empty : sqlReplacement.Value;
return new SqlString(Regex.Replace(input, pattern, replacement));
}
}
Now, build the project. Open the SQL Server Management Studio
. Select your database and replace the path value of the following FROM
clause to match your StackOverflow.dll
:
CREATE ASSEMBLY [StackOverflow] FROM 'C:\Users\gotqn\Desktop\StackOverflow\StackOverflow\bin\Debug\StackOverflow.dll';
Finally, create the SQL CLR
function:
CREATE FUNCTION [dbo].[StackOverflowRegexReplace] (@input NVARCHAR(MAX),@pattern NVARCHAR(MAX), @replacement NVARCHAR(MAX))
RETURNS NVARCHAR(4000)
AS EXTERNAL NAME [StackOverflow].[StackOverflow].[Replace]
GO
You are ready to use RegexReplace
.net
function directly in your T-SQL
statements:
SELECT [dbo].[StackOverflowRegexReplace] ('Hello Kitty Essential Accessory Kit', '[^\u0000-\u007F]', '')
//Hello Kitty Essential Accessory Kit
if you are looking for alphabets and numbers only in a string, than this can help you out.
In this, Regex is used to replace all characters other than alphabets and numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With