I have problem with cleaning comments and empty lines from already existing sql file. The file has over 10k lines so cleaning it manually is not an option.
I have a little python script, but I have no idea how to handle comments inside multi line inserts.
f = file( 'file.sql', 'r' )
t = filter( lambda x: not x.startswith('--') \
and not x.isspace()
, f.readlines() )
f.close()
t #<- here the cleaned data should be
This should be cleaned:
-- normal sql comment
This should stay as it is:
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
[...]
END;
$$;
INSERT INTO public.texts (multilinetext) VALUES ('
and more lines here \'
-- part of text
\'
[...]
');
Creating the procedure for removing comments The procedure for stripping the comments has three steps: Running EXEC sp_helptext to get the T-SQL query in a enumerated table. Stripping and removing all slash star /* comments; whether in-line or multiple lines. Removing single line comments — comments.
The syntax for a comment in a line of SQL code is a double hyphen ( -- ) at the beginning of the line. The comment affects all of the SQL code in the line. Note: This process does not try to merge a new comment with an existing comment.
Syntax Using /* and */ symbols A comment that starts with /* symbol and ends with */ and can be anywhere in your SQL statement. This method of commenting can span several lines within your SQL.
There are three types of comments, which are given below: Single line comments. Multi-line comments. Inline comments.
Adding an updated answer :)
import sqlparse
sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""
print sqlparse.format(sql_example, strip_comments=True).strip()
Output:
SELECT * from test; INSERT INTO test VALUES (' -- test a ');
It achieves the same result but also covers all other corner cases and more concise
Try the sqlparse module.
Updated example: leaving comments inside insert values, and comments within CREATE FUNCTION blocks. You can tweak further to tune the behavior:
import sqlparse
from sqlparse import tokens
queries = '''
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
END;
$$;
SELECT -- comment
* FROM -- comment
TABLE foo;
-- comment
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a
-- foo bar'
);
'''
IGNORE = set(['CREATE FUNCTION',]) # extend this
def _filter(stmt, allow=0):
ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)]
start = ' '.join(d.value for d in ddl[:2])
if ddl and start in IGNORE:
allow = 1
for tok in stmt.tokens:
if allow or not isinstance(tok, sqlparse.sql.Comment):
yield tok
for stmt in sqlparse.split(queries):
sql = sqlparse.parse(stmt)[0]
print sqlparse.sql.TokenList([t for t in _filter(sql)])
Output:
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
END;
$$;
SELECT * FROM TABLE foo;
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a
-- foo bar'
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With