Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL find-and-replace regular-expression capturing-group limit?

I need to convert data from a spreadsheet into insert statements in SQL. I've worked out most of the regular expressions for using the find and replace tool in SSMS, but I'm running into an issue when trying to reference the 9th parenthesized item in my final replace.

Here is the original record:

Blue Doe 12/21/1967 1126 Queens Highway Torrance CA 90802 N 1/1/2012

And this is what I need (for now):

select 'Blue','Doe','19671221','1126 Queens Highway','Torrance','CA','90802','N','20120101'

Due to limitations on the number of parenthesized items allowed I have to run through the replace three times. This may work into a stored procedure if I can make first make this work as a POC.

This is the first matching expression:

^{:w:b:w:b}{:z}/{:z}/{:z:b[0-9A-Za-z:b]+:b:w:b[A-Z]+:b:z:b:w:b}{:z}/{:z}/{:z}

And the replace: \10\2/0\3/\40\5/0\6/\7

This adds zeros to the months and days so that they have at least two characters.

The next match reformats the dates into the format required in the query (no comments about not using a date field. This is a client requirement for the database).

Matching expression:

^{:w:b:w:b}[0-9]*{[0-9]^2}/[0-9]*{[0-9]^2}/{:z}{:b[0-9A-Za-z:b]+:b:w:b[A-Z]+:b:z:b:w:b}[0-9]*{[0-9]^2}/[0-9]*{[0-9]^2}/{:z}

And the replace: \1\4\(2,2)\(2,3)\5\8\(2,6)\(2,7)

Finally, the final match inserts the results into the SQL statement that will get used in an insert statement.

Matching expression:

^{:w}:b{:w}:b{:z}:b{[0-9A-Za-z:b]+}:b{:w}:b{[A-Z]+}:b{:z}:b{:w}:b{:z}

And the replace: select '\1','\2','\3','\4','\5','\6','\7','\8','\9'

It all works except the last replacement. For some reason the \9 is NOT getting the data from the match. If I just replace the whole replace expression with \9 I get a blank space. If I use \8, I get N. If I eliminate the 8th parenthesized item, thus making my 9th item eighth, it returns what I want, 20120101.

So my question is, does SSMS / SQL allow for 9 tagged expressions when using find / replace and regular expressions? Or am I missing something here? I know there are other ways to do this. I'm just trying to get it done quickly as a POC before we move this into a sproc or application.

Thanks for any assistance. -Peter

like image 657
Peter Anderson Avatar asked Mar 29 '12 17:03

Peter Anderson


People also ask

How do I capture a group in RegEx?

To capture all matches to a regex group we need to use the finditer() method. The finditer() method finds all matches and returns an iterator yielding match objects matching the regex pattern. Next, we can iterate each Match object and extract its value.

What are grouping constructs in RegEx?

Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string. You can use grouping constructs to do the following: Match a subexpression that is repeated in the input string.

What is RegEx replace in SQL?

The Oracle/PLSQL REGEXP_REPLACE function is an extension of the REPLACE function. This function, introduced in Oracle 10g, will allow you to replace a sequence of characters in a string with another set of characters using regular expression pattern matching.

Does SQL accept RegEx?

You can use RegEx in many languages like PHP, Python, and also SQL. RegEx lets you match patterns by character class (like all letters, or just vowels, or all digits), between alternatives, and other really flexible options.


1 Answers

None of your matching expressions work with the record you provided in my MS SQL Server Management Studio 2008r2.

From your description it sounds like there is an issue with the Tagged Expression 9 since the desired result is returned when using Tagged Expression 8, but not 9. You may want to ask Microsoft or report it as a bug.

A quicker solution would be to move the text you are performing the Find/Replace on in SSMS to a spread sheet and use cell formulas to parse the data into insert commands. If you have MS Excel the CONCATENATE, FIND, and MID functions will probably be useful. Also, it helps to split the values into their own columns so you can format the date, then use one concatenate to build your insert.

Please let me know if you need an example.

Update: I tried your example in MS SQL Server Management Studio 2008r2, Visual Studio 2005, and Visual Studio 2010 with the same result you get, \9 returns an empty string. Checking around I found that others are also having this issue (see the community content from Henrique Evaristo) and that the whole system has been replaced in the new editors.

So in answer to your question, SSMS does not support 9 tagged expressions due to a bug.

If you are unable to use the Spreadsheet idea you could try splitting the action into two parts, setting the first 8 values, then swinging back again to do the last. For example:

^{:w}:b{:w}:b{:z}:b{[0-9A-Za-z:b]+}:b{:w}:b{[A-Z]+}:b{:z}:b{:w}:b:z
select '\1','\2','\3','\4','\5','\6','\7','\8','\0'

:w:b:w:b:z:b[0-9A-Za-z:b]+:b:w:b[A-Z]+:b:z:b:w:b{:z}
\1
like image 128
Trisped Avatar answered Oct 23 '22 12:10

Trisped