Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex for parsing SQL statements

Tags:

python

regex

I've got an IronPython script that executes a bunch of SQL statements against a SQL Server database. the statements are large strings that actually contain multiple statements, separated by the "GO" keyword. That works when they're run from sql management studio and some other tools, but not in ADO. So I split up the strings using the 2.5 "re" module like so:

splitter = re.compile(r'\bGO\b', re.IGNORECASE)
for script in splitter.split(scriptBlob):
    if(script):
        [... execute the query ...]

This breaks in the rare case that there's the word "go" in a comment or a string. How in the heck would I work around that? i.e. correctly parse this string into two scripts:

-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')

EDIT:

I searched more and found this SQL documentation: http://msdn.microsoft.com/en-us/library/ms188037(SQL.90).aspx

As it turns out, GO must be on its own line as some answers suggested. However it can be followed by a "count" integer which will actually execute the statement batch that many times (has anybody actually used that before??) and it can be followed by a single-line comments on the same line (but not a multi-line, I tested this.) So the magic regex would look something like:

"(?m)^\s*GO\s*\d*\s*$"

Except this doesn't account for:

  • a possible single-line comment ("--" followed by any character except a line break) at the end.
  • the whole line being inside a larger multi-line comment.

I'm not concerned about capturing the "count" argument and using it. Now that I have some technical documentation i'm tantalizingly close to writing this "to spec" and never having to worry about it again.

like image 708
James Orr Avatar asked Apr 22 '09 20:04

James Orr


1 Answers

Is "GO" always on a line by itself? You could just split on "^GO$".

like image 82
mcassano Avatar answered Sep 28 '22 02:09

mcassano