Using PLY to parse SQL statements

Tags:

I know there are other tools out there to parse SQL statements, but I am rolling out my own for educational purposes. I am getting stuck with my grammar right now.. If you can spot an error real quick please let me know.

Click to copy

SELECT = r'SELECT'
FROM = r'FROM'
COLUMN = TABLE = r'[a-zA-Z]+'
COMMA = r','
STAR = r'\*'
END = r';'
t_ignore = ' ' #ignores spaces

statement : SELECT columns FROM TABLE END

columns : STAR
        | rec_columns

rec_columns : COLUMN
            | rec_columns COMMA COLUMN

When I try to parse a statement like 'SELECT a FROM b;' I get an syntax error at the FROM token... Any help is greatly appreciated!

(Edit) Code:

Click to copy

#!/usr/bin/python
import ply.lex as lex
import ply.yacc as yacc

tokens = (
    'SELECT',
    'FROM',
    'WHERE',
    'TABLE',
    'COLUMN',
    'STAR',
    'COMMA',
    'END',
)

t_SELECT    = r'select|SELECT'
t_FROM      = r'from|FROM'
t_WHERE     = r'where|WHERE'
t_TABLE     = r'[a-zA-Z]+'
t_COLUMN    = r'[a-zA-Z]+'
t_STAR      = r'\*'
t_COMMA     = r','
t_END       = r';'

t_ignore    = ' \t'

def t_error(t):
    print 'Illegal character "%s"' % t.value[0]
    t.lexer.skip(1)

lex.lex()

NONE, SELECT, INSERT, DELETE, UPDATE = range(5)
states = ['NONE', 'SELECT', 'INSERT', 'DELETE', 'UPDATE']
current_state = NONE

def p_statement_expr(t):
    'statement : expression'
    print states[current_state], t[1]

def p_expr_select(t):
    'expression : SELECT columns FROM TABLE END'
    global current_state
    current_state = SELECT
    print t[3]


def p_recursive_columns(t):
    '''recursive_columns : recursive_columns COMMA COLUMN'''
    t[0] = ', '.join([t[1], t[3]])

def p_recursive_columns_base(t):
    '''recursive_columns : COLUMN'''
    t[0] = t[1]

def p_columns(t):
    '''columns : STAR
               | recursive_columns''' 
    t[0] = t[1]

def p_error(t):
    print 'Syntax error at "%s"' % t.value if t else 'NULL'
    global current_state
    current_state = NONE

yacc.yacc()


while True:
    try:
        input = raw_input('sql> ')
    except EOFError:
        break
    yacc.parse(input)

771

asked Sep 08 '11 22:09

sampwing

1 Answers

I think your problem is that your regular expressions for t_TABLE and t_COLUMN are also matching your reserved words (SELECT and FROM). In other words, SELECT a FROM b; tokenizes to something like COLUMN COLUMN COLUMN COLUMN END (or some other ambiguous tokenization) and this doesn't match any of your productions so you get a syntax error.

As a quick sanity check, change those regular expressions to match exactly what you're typing in like this:

Click to copy

t_TABLE = r'b'
t_COLUMN = r'a'

You will see that the syntax SELECT a FROM b; passes because the regular expressions 'a' and 'b' don't match your reserved words.

And, there's another problem that the regular expressions for TABLE and COLUMN overlap as well, so the lexer can't tokenize without ambiguity with respect to those tokens either.

There's a subtle, but relevant section in the PLY documentation regarding this. Not sure the best way to explain this, but the trick is that the tokenization pass happens first so it can't really use context from your production rules to know whether it has come across a TABLE token or a COLUMN token. You need to generalize those into some kind of ID token and then weed things out during the parse.

If I had some more energy I'd try to work through your code some more and provide an actual solution in code, but I think since you've already expressed that this is a learning exercise that perhaps you will be content with me pointing in the right direction.

180

answered Sep 20 '22 02:09

Joe Holloway

Related questions
                            
                                How to use tun/tap interface to split packets, tunnel and then reassemble. (similar to MLPPP)
                            
                                Clean up ugly WYSIWYG HTML code? Python or *nix utility
                            
                                Why does Django not generate CSRF or Session Cookies behind a Varnish Proxy?
                            
                                How to generate formatted pdf or eps table using Python?
                            
                                Python: exec statement and unexpected garbage collector behavior
                            
                                creating elevation/height field gdal numpy python
                            
                                Dependency Testing with Python
                            
                                How to tweak my tooltips in wxpython?
                            
                                Force repaint of wxPython Window, wxmpl plot
                            
                                Wokkel Resources [closed]
                            
                                Reading SHOUTcast/Icecast metadata from a radio stream with Python
                            
                                How do I get the stack trace from an Exception Object in Python 2.7?
                            
                                Google Maps and Google App Engine
                            
                                How to delete a user's cookie using python on app engine?
                            
                                Copying triangular image region with PIL
                            
                                shutil.copy failure when the destination already exists and is read-only
                            
                                Using regex to extract information from a string
                            
                                CookieError: Illegal key value
                            
                                why inheriting from object type
                            
                                Repeating "events" in a calendar: CPU vs Database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using PLY to parse SQL statements

Tags:

python

sql

parsing

context-free-grammar

ply

sampwing

People also ask

1 Answers

Joe Holloway

Recent Activity

Donate For Us