Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to remove comments from SQL statement

I'm trying to come up with a regular expression to remove comments from an SQL statement.

This regex almost works:

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|'(?:[^']|'')*'|(--.*)

Excepth that last part doesn't handle "--" comments very well. The problem is handling SQL strings, delimited with ''.

For example, if i have

SELECT ' -- Hello -- ' FROM DUAL

It shouldn't match, but it's matching.

This is in ASP/VBscript.

I've thought about matching right-to-left but i don't think the VBScript's regex engine supports it. Also tried fiddling with negative lookbehind but the results weren't good.

like image 298
Nuno Leong Avatar asked Mar 13 '12 19:03

Nuno Leong


3 Answers

In PHP, i'm using this code to uncomment SQL:

$sqlComments = '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
/* Commented version
$sqlComments = '@
    (([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
    |(                   # $3 : Match comments
        (?:\#|--).*?$    # - Single line comments
        |                # - Multi line (nested) comments
         /\*             #   . comment open marker
            (?: [^/*]    #   . non comment-marker characters
                |/(?!\*) #   . ! not a comment open
                |\*(?!/) #   . ! not a comment close
                |(?R)    #   . recursive case
            )*           #   . repeat eventually
        \*\/             #   . comment close marker
    )\s*                 # Trim after comments
    |(?<=;)\s+           # Trim after semi-colon
    @msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
like image 113
Adrien Gibrat Avatar answered Oct 20 '22 01:10

Adrien Gibrat


This code works for me:

function strip_sqlcomment ($string = '') {
    $RXSQLComments = '@(--[^\r\n]*)|(\#[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)@ms';
    return (($string == '') ?  '' : preg_replace( $RXSQLComments, '', $string ));
}

with a little regex tweak it could be used to strip comments in any language

like image 36
gonzalezea Avatar answered Oct 20 '22 00:10

gonzalezea


As you said that the rest of your regex is fine, I focused on the last part. All you need to do is verify that the -- is at the beginning and then make sure it removes all dashes if there are more than 2. The end regex is below

(^[--]+)

The above is just if you want to remove the comment dashes and not the whole line. You can run the below if you do want everything after it to the end of the line, also

(^--.*)
like image 25
Justin Pihony Avatar answered Oct 20 '22 00:10

Justin Pihony