Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split sql statements in php on semicolons (but not inside quotes)

I have a system that is causing errors when users use a semicolon in a free format field. I have traced it down to a simple explode statement:

$array = explode( ";", $sql );

Because this line is in a subroutine that is called from all over the system I would like to replace this line with something that will split things properly, without breaking the rest of the system. I thought I was onto a winner with str_getcsv, but that isn't sophisticated enough either. Look at the following example

$sql = "BEGIN;INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF');INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw') ON DUPLICATE KEY UPDATE f='XY\'s Z ;uvw';COMMIT;";

$array = str_getcsv($sql, ";", "'");
foreach( $array as $value ) {
    echo $value . "<br><br>";
}

When I run this is outputs the following:

BEGIN

INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c

DEF')

INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z

uvw') ON DUPLICATE KEY UPDATE f='XY\'s Z

uvw'

COMMIT

So it doesn't notice the semicolons are inside quotes. (As far as I can see the quoted strings from different places in the system are always in single quotes, but it is possible that at times they are double quotes, I am not sure about that.)

Can anyone tell me how to do this? I suspect I can do this with a very complicated regex, but this is over my head.

like image 660
user3777863 Avatar asked Jun 26 '14 05:06

user3777863


1 Answers

(*SKIP)(*FAIL) Magic

This live PHP demo shows you the output of the two options below (with or without the semi-colon).

This is what you need:

$splits = preg_split('~\([^)]*\)(*SKIP)(*F)|;~', $sql);

See demo to see that we are splitting on the right semi-colons.

Output:

[0] => BEGIN
[1] => INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF')
[2] => INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw')
[3] => COMMIT
[4] =>

The empty item #4 is the match on the other side of the final ;. The other option is to keep the semi-colons (see below).

Option 2: Keep the Semi-Colons

If you want to keep the semi-colons, go with this:

$splits = preg_split('~\([^)]*\)(*SKIP)(*F)|(?<=;)(?![ ]*$)~', $sql);

Output:

[0] => BEGIN;
[1] => INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF');
[2] => INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw');
[3] => COMMIT;

Explanation

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

In left side of the alternation |, the regex \([^)]*\) matches complete (parentheses) then deliberately fails, after which the engine skips to the next position in the string. The right side matches the ; word you want, and we know they are the right ones because they were not matched by the expression on the left. It is now safe to split on it.

In Option 2, where we keep the semi-colons, our match on the right matches a position, but no characters. That position is asserted by the lookbehind (?<=;), which asserts that a ; immediately precedes the position, and the negative lookahead (?![ ]*$), which asserts that what follows is not optional spaces then the end of the string (so we avoid a last empty match).

Sample Code

Please examine the live PHP demo.

Reference

  • How to match (or replace) a pattern except in situations s1, s2, s3...
  • Article about matching a pattern unless...
  • Lookahead and Lookbehind Zero-Length Assertions
  • Mastering Lookahead and Lookbehind
  • Special Backtracking Control Verbs
like image 94
zx81 Avatar answered Nov 17 '22 00:11

zx81