Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to extract SQL query

Is there a regex which extracts SQL queries from a string? I'm NOT interested to validate any SQL syntax, rather and only extracting a selection of SQL commands. This to parse a given SQL file/string in a flexible manner.

Given is the following SQL file/string example:

SELECT
    *
FROM
    test_table
WHERE
    test_row = 'Testing ; semicolon';

SELECT * FROM another_test_table;

INSERT INTO 
    table_name 
VALUES 
    (value1,'value which contains semicolon ;;;;',value3,...);

Some pseudocode example would be: ^(UPDATE|SELECT|INSERT INTO)(.*)(;)$. In the future i'm looking to extend this with all (possible) commands.

  • Look for a starting match with either: (UPDATE|SELECT|INSERT|INTO)
  • Zero or more any character (including whitespaces and newlines)
  • Stop at ;, which delimits the SQL query.

Whenever this would be possible via a regex the following java code is able to extract all SQL commands:

final String regex = "LOOKING_FOR_THIS_ONE";
final Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = p.matcher(content);

while (matcher.find()) {
  // matcher.group() now contains the full SQL command
}

Thanks in advance!

like image 776
Velth Avatar asked Oct 22 '22 09:10

Velth


2 Answers

I'll start off by saying that this is not a good way of doing it, and strongly urge you to find another method of doing it, preferrably tagging it properly where the statements are made, so you don't end up in this situation.

That being said, SQL requires it to start with one of the following; DELETE, SELECT, WITH, UPDATE or INSERT INTO. It also requires that the input ends with ;.

We can use this to grab all sequences matching SQL with the following:

final String regex = "^(INSERT INTO|UPDATE|SELECT|WITH|DELETE)(?:[^;']|(?:'[^']+'))+;\\s*$";
final Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);

Group 1 now holds the operating word, in case you wish to filter valid SQL on UPDATE or SELECT.

See the regex in action, as well as a cave-at here:

https://regex101.com/r/dt9XTK/2

like image 53
melwil Avatar answered Oct 24 '22 04:10

melwil


You can match it "properly" as long as the semicolon is the last non-whitespace character on that line.

final String regex = ^(SELECT|UPDATE|INSERT)[\s\S]+?\;\s*?$

final Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = p.matcher(content);
like image 40
Bart Avatar answered Oct 24 '22 03:10

Bart