I am writing a python function to process multi-line SQL statements.
e.g.
multi_stmt = """
-- delete empty responses
DELETE FROM idlongDVR_responses WHERE new_response_code = '';
DELETE FROM idwideDVR_responses WHERE new_response_code = '';
-- create a current responses table for idlongDVR
DROP TABLE IF EXISTS idlongDVR_respCurr;
CREATE TABLE idlongDVR_respCurr
SELECT *, MAX(modifiedat) AS latest FROM idlongDVR_responses
GROUP BY sitecode, id, dass, tass, field, value, validation_message
ORDER BY sitecode, id, dass, tass; """
So I have written a regular expression to identify a newline if it is not followed by a double hyphen (start comment), and ends in a semi-colon
sql_line = re.compile(r"""
\n+ # starting from a new line sequence
(?!(--|\n)) # if not followed by a comment start "--" or newline
(.*?) # <<<<< WHY ARE THESE CAPTURING BRACKETS NEEDED?
; # ending with a semicolon
""", re.DOTALL|re.VERBOSE|re.MULTILINE)
stmts = sql_line.findall(multi_statement)
for stmt in stmts:
stmt = stmt[1]
if len(stmt) > 0:
cursor.execute(stmt)
It works OK but only if I enclose the .*? term in brackets so it becomes (.*?). If I don't then I don't match anything.
Why is this? Thanks in advance.
"These capturing brackets are needed" because you used a capturing bracket inside the negative lookahead.
(?!(--|\n))
^ ^
Since this should never be matched, the first capturing group will always be empty in a successful match. Since some methods like .findall will only return capturing groups (if they exist), you'll only see a list of empty strings.
Removing the (...) here should make the regex behave as you expect. BTW you could use [^;]* instead of .*?.
sql_line = re.compile(r"\n+(?!--|\n)[^;]*;")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With