How do I remove comments from an Oracle SQL script using pyparsing in Python 3.x?

Question

I have an oracle sql script containing multiple statements which are separated by ";". I'm trying to remove all comments within this entire script. This includes:

single line comments -- till a newline.
block comments /* ... */
Additionally, I would like to be able to split the entire script into its statements.

I know of the library called sqlparse that can format to remove code, but I came across this edge case:

edge_case = """myval := oneval || otherval ||--;
-- single comment line
lpad(nvl(myval, ' '), 10, ' ');
"""

formatted = sqlparse.format(edge_case, strip_comments=True)
sqlparse.split(formatted)

# returns ['myval := oneval || otherval ||--;', "lpad(nvl(myval, ' '), 10, ' ');"]
# it splits into 2 statements when it should only be 1!

My idea is to focus on each commenting case, and then try to parse the rest of non-commented code by just string tokens which consists of all valid characters in sql.

from pyparsing import *

# define basic elements
semicolon = Literal(";")
# all valid chars excluding semicolon
all_valid_chars = alphanums + "_-+*/.,:()[]{}<>=&|"

# define an sql token
sql_token = Word(all_valid_chars)


# need to make sure that "--" is a special case
single_line_comment_token = Literal("--")
single_line_comment = single_line_comment_token + SkipTo(line_end) + line_end
test1_single_line_comment = """
-- single comment line
--- also a comment
---- also a comment
-- comment -- continues
-- comment /* also continues */

# - not a comment
"""
single_line_comment.run_tests(test1_single_line_comment)


# also handle multi line comments
multi_line_comment_open = Literal("/*")
multi_line_comment_close = Literal("*/")
multi_line_comment = multi_line_comment_open + SkipTo(multi_line_comment_close) + multi_line_comment_close

test1_multi_line_comment = """
/* multi line comment */
/* outer comment /* inner comment */

/* /* /* still valid */
/* -- still valid */
"""
multi_line_comment.run_tests(test1_multi_line_comment)

test2_multi_line_comment = """
/* multi line comment /* with nested comment should fail! */ */
"""
multi_line_comment.run_tests(test2_multi_line_comment, failure_tests=True)

I'm now stuck here and not sure where to continue.

statement2 = OneOrMore(single_line_comment | multi_line_comment | sql_token) + semicolon

# run tests
print(statement2.parseString("myval := oneval || otherval ||--;"))

My issues:

The tests for each of these cases work, but I dont know how I should be testing the together to cover overlaps eg. -- /* this should be a single line comment etc.
I don't know how to handle the rest of the non commented code properly. For example, all valid characters should include -, but this should mess up my single line code? Less technical issues faced:
Honestly not sure where to even begin or how to structure my code since these cases can overlap. I'm not even sure if pyparsing is what I should be doing to solve this problem, but based on questions i've seen trying to remove comments from sql, parsing is the only real robust solution.
run_tests does not work with multiple lines. This makes it hard to test the block comment although i can use parse_string.

PaulMcG · Accepted Answer

For stripping comments (or any pyparsing expression), you should use transform_string, not parse_string. transform_string scans through the input string and applies parse actions and suppressions. Here is some code to strip out python comments from a script:

import pyparsing as pp

comment = "#" + pp.rest_of_line
# could also use the provided `python_style_comment`
# comment = pp.python_style_comment

python_source = """
def hello(s):
    # say a nice hello!
    print(f"Hi, {s}!")
    # add a hashtag for social media
    print("#hello")
"""

# suppress comments and transform the string
print(comment.suppress().transform_string(python_source))

giving:

def hello(s):
    
    print(f"Hi, {s}!")
    
    print("

Oops, this does not detect the fact that #hello is inside a quoted string.

To fix this, we also insert a parser for quoted strings, but these we don't suppress:

# parse quoted strings first, in case they contain a comment
transform_expr = pp.quoted_string | comment.suppress()
print(transform_expr.transform_string(python_source))

Now giving:

def hello(s):
    
    print(f"Hi, {s}!")
    
    print("#hello")

For your SQL comment handling, you'll do much the same:

sql_single_line_comment = '--' + pp.rest_of_line

# use pyparsing's definition for C-style /* ... */ multiline comments
sql_multi_line_comment = pp.c_style_comment

comment_remover = (
    # parse quoted strings separately so that they don't get suppressed
    pp.quoted_string
    | (sql_single_line_comment | sql_multi_line_comment).suppress()
)

sql = "SELECT * FROM STUDENTS; -- watch out for Bobby Tables!"
print(comment_remover.transform_string(sql))

prints:

SELECT * FROM STUDENTS;

I'm glad to see you are using run_tests! If you want to define tests that span multiple lines, than instead of using the multiline string form, pass the test as a list of strings:

expr.run_tests([test_str1, test_str2, test_str3])

Unfortunately, I don't have a version of run_tests that calls transform_string.

Splitting the lines up into separate statements is best done in a second pass, after first stripping out the comments. You might try something like:

semi = pp.Literal(";")
semi.add_parse_action(pp.replace_with(";

")

And then use the same transformation pattern as was used to strip comments (but this time, using the parse action instead of suppressing). Or use scan_string to locate the ";" terminators, and then write out the SQL that lies between semicolons out to their separate files (left as an exercise for the reader).

How do I remove comments from an Oracle SQL script using pyparsing in Python 3.x?

Tags:

oracle-database

sql

python-3.x

pyparsing

sql-parser

Chowder

1 Answers

PaulMcG

Recent Activity

Donate For Us

How do I remove comments from an Oracle SQL script using pyparsing in Python 3.x?

Tags:

oracle-database

sql

python-3.x

pyparsing

sql-parser

Chowder

1 Answers

PaulMcG

Related questions

Recent Activity

Donate For Us