You can use RegEx in many languages like PHP, Python, and also SQL. RegEx lets you match patterns by character class (like all letters, or just vowels, or all digits), between alternatives, and other really flexible options.
An expression is a block of code in a T-SQL statement that is processed as a unit in order to return a scalar (single) data value. Each expression is made up of one or more types of components, including constants, columns, variables, operators, scalar functions, and scalar subqueries.
SQL pattern matching allows you to search for patterns in data if you don't know the exact word or phrase you are seeking. This kind of SQL query uses wildcard characters to match a pattern, rather than specifying it exactly. For example, you can use the wildcard "C%" to match any string beginning with a capital C.
How about the PATINDEX function?
The pattern matching in TSQL is not a complete regex library, but it gives you the basics.
(From Books Online)
Wildcard Meaning
% Any string of zero or more characters.
_ Any single character.
[ ] Any single character within the specified range
(for example, [a-f]) or set (for example, [abcdef]).
[^] Any single character not within the specified range
(for example, [^a - f]) or set (for example, [^abcdef]).
If anybody is interested in using regex with CLR here is a solution. The function below (C# .net 4.5) returns a 1 if the pattern is matched and a 0 if the pattern is not matched. I use it to tag lines in sub queries. The SQLfunction attribute tells sql server that this method is the actual UDF that SQL server will use. Save the file as a dll in a place where you can access it from management studio.
// default using statements above
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
namespace CLR_Functions
{
public class myFunctions
{
[SqlFunction]
public static SqlInt16 RegexContain(SqlString text, SqlString pattern)
{
SqlInt16 returnVal = 0;
try
{
string myText = text.ToString();
string myPattern = pattern.ToString();
MatchCollection mc = Regex.Matches(myText, myPattern);
if (mc.Count > 0)
{
returnVal = 1;
}
}
catch
{
returnVal = 0;
}
return returnVal;
}
}
}
In management studio import the dll file via programability -- assemblies -- new assembly
Then run this query:
CREATE FUNCTION RegexContain(@text NVARCHAR(50), @pattern NVARCHAR(50))
RETURNS smallint
AS
EXTERNAL NAME CLR_Functions.[CLR_Functions.myFunctions].RegexContain
Then you should have complete access to the function via the database you stored the assembly in.
Then use in queries like so:
SELECT *
FROM
(
SELECT
DailyLog.Date,
DailyLog.Researcher,
DailyLog.team,
DailyLog.field,
DailyLog.EntityID,
DailyLog.[From],
DailyLog.[To],
dbo.RegexContain(Researcher, '[\p{L}\s]+') as 'is null values'
FROM [DailyOps].[dbo].[DailyLog]
) AS a
WHERE a.[is null values] = 0
There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.
In case anyone else is still looking at this question, http://www.sqlsharp.com/ is a free, easy way to add regular expression CLR functions into your database.
If you are using SQL Server 2016 or above, you can use sp_execute_external_script
along with R. It has functions for Regular Expression searches, such as grep
and grepl
.
Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People]
table in the [WideWorldImporters]
sample database. They get passed to the R engine as a dataframe named InputDataSet
. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.
EXEC sp_execute_external_script
@language = N'R',
@script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
@input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))
Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".
Closing Thoughts Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With