Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to get selected columns

Tags:

regex

php

mysql

I have to extract column names from a MYSQL SELECT and I'd wish to do that using Regex.
It's a plain SELECT, something like:
SELECT column1, column2 ... FROM table

I have to cover every cases, with our without alias, with or without table in front of it, with or without the quoting char:

SELECT column, column as foo, table.column, table.column as foo, 
       `column`, `column` as foo, `table`.`column`, `table`.`column` as foo
       .....

Currently I've been able to work out this regex: #\w+(\sas)?#i but it's not good vs prefixed columns.
Any help?

By the way, is Regex good at this task?

EDIT
Thanks for the answers!
The patterns you posted are valid for the whole query, actually I'm already processing every single column:

$fields = Frameworkmethod::getSelectFields($query);
$columns = explode(',' , $fields);
foreach($columns as $column)
{
     //do Regex work to "clean up" the single field and get the "standard" one (not the alias)
     //`#__tracktime_projects`.`pr_name` AS `project_name` should return pr_name
}

As stated in the comment above, I always need the field name, not the alias one. Sorry for not pointing it out before!

like image 258
tampe125 Avatar asked Apr 14 '13 18:04

tampe125


2 Answers

I made use of Collapse and Capture a Repeating Pattern in a Single Regex Expression and adapted it to fit this purpose.

So, a hopefully bulletproof RegEx for capturing column names from a *SQL query :

/(?:SELECT\s++(?=(?:[#\w,`.]++\s++)+)|(?!^)\G\s*+,\s*+(?:`?+\s*+[#\w]++\s*+`?+\s*+\.\s*+)?+`?+\s*+)(\w++)`?+(?:\s++as\s++[^,\s]++)?+/ig

Explained Online demo: http://regex101.com/r/wL7yA9

PHP code using preg_match_all() with single RegEx, commented with /x modifier:

preg_match_all('/(?:SELECT\s++(?=(?:[\#\w,`.]++\s++)+) # start matching on SELECT
                |              # or
                (?!^)\G        # resume from last match position 
                \s*+,\s*+      # delimited by a comma 
                (?:`?+\s*+     # optional prefix table with optional backtick
                    [\#\w]++   # table name
                    \s*+`?+    # optional backtick
                    \s*+\.\s*+ # dot separator
                )?+ # optional prefix table end group

                `?+\s*+ # optional backtick

            ) # initial match or subsequent match

            (\w++)    # capturing group
            `?+         # optional backtick


            (?:\s++as\s++[^,\s]++)?+ # optional alias

            /ix', $query, $matches);

Live code: http://codepad.viper-7.com/VTaPd3

Note: the 'hopefully bulletproof' is aimed at valid SQL


PHP code using explode()

$columns = explode(',', $fields);

foreach($columns as $column)
{
    $regex='/([\w]++)`?+(?:\s++as\s++[^,\s]++)?+\s*+(?:FROM\s*+|$)/i';

    preg_match($regex, $column, $match);

    print $match[1]; // field stored in $match[1]
}

Live code with example extraction: http://codepad.viper-7.com/OdUGXd

like image 59
CSᵠ Avatar answered Oct 23 '22 11:10

CSᵠ


I used PHP:

$query = 'SELECT column1, column2 as foo, table.column3, table.column4 as foo, 
       `column5`, `column6` as foo, `table`.`column7`, `table`.`column8` as foo
       FROM table';

$query = preg_replace('/^SELECT(.*?)FROM.*$/s', '$1', $query); // To remove the "SELECT" and "FROM table..." parts

preg_match_all('/(?:
    (?:`?\w+`?\.)? (?:`)?(\w+)(?:`)? (?:\s*as\s*\w+)?\s*
#   ^--TableName-^ ^---ColumnName--^ ^----AsFoo-----^
)+/x',$query, $m);

print_r($m[1]);

Output:

Array
(
    [0] => column1
    [1] => column2
    [2] => column3
    [3] => column4
    [4] => column5
    [5] => column6
    [6] => column7
    [7] => column8
)

Live demo: http://www.rubular.com/r/H960NFKCTr


UPDATE: Since you're using some "unusual" but valid SQL table names (e.g.: #__tracktime_projects) it has messed up the regex. So to fix this issue, I added a variable which contains what characters we would expect, I also added the i modifier to make the match caseless:

$query = 'SELECT column1, column2 as foo, table.column3, table.column4 as foo, 
       `column5`, `column6` as foo, `table`.`column7`, `table`.`column8` as foo, `#__tracktime_projects`.`pr_name` AS project_name, `#wut`
       FROM table';


$query = preg_replace('/^SELECT(.*?)FROM.*$/s', '$1', $query); // To remove the "SELECT" and "FROM table..." parts

$allowed = '\w#'; // Adjust this to the names that you expect.

preg_match_all('/(?:
    (?:`?['.$allowed.']++`?\.)?
#   ^--------TableName--------^

    (?:`)?(['.$allowed.']++)(?:`)?
#   ^----------ColumnName--------^

    (?:\s*as\s*['.$allowed.']++)?\s*
#   ^-------------AsFoo------------^
)+
/xi',$query, $m);

print_r($m[1]);

Output:

Array
(
    [0] => column1
    [1] => column2
    [2] => column3
    [3] => column4
    [4] => column5
    [5] => column6
    [6] => column7
    [7] => column8
    [8] => pr_name
    [9] => #wut
)

Live demo: http://www.rubular.com/r/D0iIHJQwB8

like image 2
HamZa Avatar answered Oct 23 '22 10:10

HamZa