Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging Pyparsing Grammar

I'm building a parser for an imaginary programming language called C-- (not the actual C-- language). I've gotten to the stage where I need to translate the language's grammar into something Pyparsing can accept. Unfortunatly when I come to parse my input string (which is correct and should not cause Pyparsing to error) it's not parsing correctly. I fear this is due to errors in my grammar, but as I'm starting Pyparsing for the first time, I can't seem to see where I'm going wrong.

I've uploaded the grammar that I'm translating from here for people to have a read through.

EDIT: Updated with the advice from Paul.

This is the grammer I've currently got (the two top lines of Syntax definition are terribly bad of me I know):

# Lexical structure definition
ifS = Keyword('if')
elseS = Keyword('else')
whileS = Keyword('while')
returnS = Keyword('return')
intVar = Keyword('int')
voidKeyword = Keyword('void')
sumdiff = Literal('+') | Literal('-')
prodquot = Literal('*') | Literal('/')
relation = Literal('<=') | Literal('<') | Literal('==') | \
           Literal('!=') | Literal('>') | Literal('=>')
lbrace = Literal('{')
rbrace = Literal('}')
lparn = Literal('(')
rparn = Literal(')')
semi = Literal(';')
comma = Literal(',')
number = Word(nums)
identifier = Word(alphas, alphanums)

# Syntax definition
term = ''
statement = ''
variable    =   intVar + identifier + semi
locals      =   ZeroOrMore(variable)
expr        =   term | OneOrMore(Group(sumdiff + term))
args        =   ZeroOrMore(OneOrMore(Group(expr + comma)) | expr)
funccall    =   Group(identifier + lparn + args + rparn)
factor      =   Group(lparn + expr + rparn) | identifier | funccall | number
term        =   factor | OneOrMore(prodquot + factor)
cond        =   Group(lparn + expr + relation + expr + rparn)
returnState =   Group(returnS + semi) | Combine(returnS + expr + semi)
assignment  =   Group(identifier + '=' + expr + semi)
proccall    =   Group(identifier + lparn + args + rparn + semi)
block       =   Group(lbrace + locals + statement + rbrace)
iteration   =   Group(whileS + cond + block)
selection   =   Group(ifS + cond + block) | Group(ifS + cond + block + elseS + block)
statement   =   OneOrMore(proccall | assignment | selection | iteration | returnState)
param       =   Group(intVar + identifier)
paramlist   =   OneOrMore(Combine(param + comma)) | param
params      =   paramlist | voidKeyword
procedure   =   Group(voidKeyword + identifier + lparn + params + rparn + block)
function    =   Group(intVar + identifier + lparn + params + rparn + block)
declaration =   variable | function | procedure
program     =   OneOrMore(declaration)

I'd like to know if there are any mistakes I've made in translating the grammar across and what improvements I could do to make it simplified whilst adhering to the grammar I've been given.

EDIT 2: Updated to include the new error.

Here is the input string I am parsing:

int larger ( int first , int second ) { 
if ( first > second ) { 
return first ; 
} else { 
return second ; 
} 
} 

void main ( void ) { 
int count ; 
int sum ; 
int max ; 
int x ; 

x = input ( ) ; 
max = x ; 
sum = 0 ; 
count = 0 ; 

while ( x != 0 ) { 
count = count + 1 ; 
sum = sum + x ; 
max = larger ( max , x ) ; 
x = input ( ) ; 
} 

output ( count ) ; 
output ( sum ) ; 
output ( max ) ; 
} 

And this is the error message I get when running my program from Terminal:

/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1156: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1258: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
Expected ")" (at char 30), (line:6, col:26)
None
like image 448
greenie Avatar asked Dec 01 '09 01:12

greenie


1 Answers

1) Change Literal("if") to Keyword("if") (and so on, down to Literal("void")), to prevent matching the leading "if" of a variable named "ifactor".

2) nums, alphas, and alphanums are not expressions, they are strings, that can be used with the Word class to define some typical sets of characters when defining "words" like "a number is a word made up of nums", or "an identifier is a word that starts with an alpha, followed by zero or more alphanums." So instead of:

number = nums
identifier = alphas + OneOrMore(alphanums)

you want

number = Word(nums)
identifier = Word(alphas, alphanums)

3) Instead of Combine, I think you want Group. Use Combine when you want the matched tokens to be contiguous with no intervening whitespace, and will concatenate the tokens and return them as a single string. Combine is often used in cases like this:

realnum = Combine(Word(nums) + "." + Word(nums))

Without Combine, parsing "3.14" would return the list of strings ['3', '.', '14'], so we add Combine so that the parsed result for realnum is '3.14' (which you could then pass to a parse action to convert to the actual floating value 3.14). Combines enforcement of no intervening whitespace also keeps us from accidentally parsing 'The answer is 3. 10 is too much.' and thinking the "3. 10" represents a real number.

4) This should not cause your error, but your input string has lots of extra spaces. If you get your grammar working, you should be able to parse "int x;" just as well as "int x ;".

Hope some of these hints get you going. Have you read any online pyparsing articles or tutorials? And please look through the online examples. You'll need to get a good grasp of how Word, Literal, Combine, etc. perform their individual parsing tasks.

5) You have mis-implemented the recursive definitions for term and statement. Instead of assigning '' to them, write:

term = Forward()
statement = Forward()

Then when you go to actually define them with their recursive definitions, use the << operator (and be sure to enclose the RHS in ()'s).

term << (... term definition ...)
statement << (... statement definition ...)

You can find an example of a recursive parser here, and a presentation on basic pyparsing usage here - see the section titled "Parsing Lists" for more step-by-step on how the recursion is handled.

like image 200
PaulMcG Avatar answered Nov 06 '22 04:11

PaulMcG