i have a similar problem like this: .NET Regex for SQL Server string... but not Unicode string?
The RegEx (?:N'(?:''|[^'])*'[^']*)*(?<!N)'(?<value>(?:''|[^'])*)' doesn't match this string correctly:
Insert into SomeTable (someColumns) values ('someValue', N'someValue', 'someValue')
it recognizes "N'someValue', 'someValue'" as a Match
I cant figure out to correct the RegEx to match all string literals but not the literals with the N-Prefix.
Like mentioned in the Link above the RegEx have to ignore escaped quotes in the space of the string like 'some '' escaped'
In my opinion, there is a better tool for your job - the TSql100Parser class:
using Microsoft.Data.Schema.ScriptDom;
using Microsoft.Data.Schema.ScriptDom.Sql;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Program
{
static void Main(string[] args)
{
IList<ParseError> errors = new List<ParseError>();
var tsql = @"
Insert into SomeTable (someColumns)
values ('someValue1',
N'someValue2',
'someValue3',
'some '' escaped')";
var result = GetLiterals(tsql);
}
private static List<string>
GetLiterals(string strQuery)
{
var parser = new TSql100Parser(false);
IList<ParseError> errors = new List<ParseError>();
var result =
parser.GetTokenStream(new StringReader(strQuery), errors);
return result
.Where(t =>
t.TokenType == TSqlTokenType.AsciiStringLiteral ||
t.TokenType == TSqlTokenType.UnicodeStringLiteral)
.Select(t => t.Text)
.ToList();
}
}
You can't use Type-3 grammars (regular expression) to parse Type-0 grammars (T-SQL). The same stands when you try to parse HTML. It will not be 100% fail proof in real life.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With