Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyparsing to parse a python function call in its most general form

I would like to use the excellent pyparsing package to parse a python function call in its most general form. I read one post that was somewhat useful here but still not general enough.

I would like to parse the following expression:

f(arg1,arg2,arg3,...,kw1=var1,kw2=var2,kw3=var3,...)

where

  1. arg1,arg2,arg3 ... are any kind of valid python objects (integer, real, list, dict, function, variable name ...)
  2. kw1, kw2, kw3 ... are valid python keyword names
  3. var1,var2,var3 are valid python objects

I was wondering if a grammar could be defined for such a general template. I am perhaps asking too much ... Would you have any idea ?

thank you very much for your help

Eric

like image 303
Eurydice Avatar asked Feb 17 '23 18:02

Eurydice


1 Answers

Is that all? Let's start with a simple informal BNF for this:

func_call ::= identifier '(' func_arg [',' func_arg]... ')'
func_arg ::= named_arg | arg_expr
named_arg ::= identifier '=' arg_expr
arg_expr ::= identifier | real | integer | dict_literal | list_literal | tuple_literal | func_call
identifier ::= (alpha|'_') (alpha|num|'_')*
alpha ::= some letter 'a'..'z' 'A'..'Z'
num ::= some digit '0'..'9'

Translating to pyparsing, work bottom-up:

identifier = Word(alphas+'_', alphanums+'_')

# definitions of real, integer, dict_literal, list_literal, tuple_literal go here
# see further text below

# define a placeholder for func_call - we don't have it yet, but we need it now
func_call = Forward()

string = pp.quotedString | pp.unicodeString

arg_expr = identifier | real | integer | string | dict_literal | list_literal | tuple_literal | func_call

named_arg = identifier + '=' + arg_expr

# to define func_arg, must first see if it is a named_arg
# why do you think this is?
func_arg = named_arg | arg_expr

# now define func_call using '<<' instead of '=', to "inject" the definition 
# into the previously declared Forward
#
# Group each arg to keep its set of tokens separate, otherwise you just get one
# continuous list of parsed strings, which is almost as worthless the original
# string
func_call << identifier + '(' + delimitedList(Group(func_arg)) + ')'

Those arg_expr elements could take a while to work through, but fortunately, you can get them off the pyparsing wiki's Examples page: http://pyparsing.wikispaces.com/file/view/parsePythonValue.py

from parsePythonValue import (integer, real, dictStr as dict_literal, 
                              listStr as list_literal, tupleStr as tuple_literal)

You still might get args passed using *list_of_args or **dict_of_named_args notation. Expand arg_expr to support these:

deref_list = '*' + (identifier | list_literal | tuple_literal)
deref_dict = '**' + (identifier | dict_literal)

arg_expr = identifier | real | integer | dict_literal | list_literal | tuple_literal | func_call | deref_list | deref_dict

Write yourself some test cases now - start simple and work your way up to complicated:

sin(30)
sin(a)
hypot(a,b)
len([1,2,3])
max(*list_of_vals)

Additional argument types that will need to be added to arg_expr (left as further exercise for the OP):

  • indexed arguments : dictval['a'] divmod(10,3)[0] range(10)[::2]

  • object attribute references : a.b.c

  • arithmetic expressions : sin(30), sin(a+2*b)

  • comparison expressions : sin(a+2*b) > 0.5 10 < a < 20

  • boolean expressions : a or b and not (d or c and b)

  • lambda expression : lambda x : sin(x+math.pi/2)

  • list comprehension

  • generator expression

like image 93
PaulMcG Avatar answered Feb 20 '23 09:02

PaulMcG