Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyparsing, forward, and recursion

I'm using pyparsing to parse vcd (value change dump) files. Essentially, I want to read in the files, parse it into an internal dictionary, and manipulate the values.

Without going into details on the structure, my problem occurs with identifying nested categories.

In vcd files, you have 'scopes' which include wires and possibly some deeper (nested) scopes. Think of them like levels.

So in my file, I have:

$scope module toplevel $end
$scope module midlevel $end
$var wire a $end
$var wire b $end
$upscope $end
$var wire c $end
$var wire d $end
$var wire e $end
$scope module extralevel $end
$var wire f $end
$var wire g $end
$upscope $end
$var wire h $end
$var wire i $end
$upscope $end

So 'toplevel' encompasses everything (a - i), 'midlevel' has (a - b), 'extralevel' has (f - g), etc.

Here is my code (snippet) for parsing this section:

scope_header = Group(Literal('$scope') + Word(alphas) + Word(alphas) + \
                     Literal('$end'))

wire_map = Group(Literal('$var') + Literal('wire') + Word(alphas) + \
                 Literal('$end'))

scope_footer = Group(Literal('$upscope') + Literal('$end'))

scope = Forward()
scope << (scope_header + ZeroOrMore(wire_map) + ZeroOrMore(scope) + \
          ZeroOrMore(wire_map) + scope_footer)

Now, what I thought happens is that as it hits each scope, it would keep track of each 'level' and I would end up with a structure containing nested scopes. However, it errors out on

$scope module extralevel $end

saying it expects '$upscope'.

So I know I'm not using the recursion correctly. Can someone help me out? Let me know if I need to provide more info.

Thanks!!!!

like image 710
RaytheonLiszt Avatar asked Nov 10 '10 03:11

RaytheonLiszt


2 Answers

According to your definition, a scope cannot contain another scope, followed by some maps, followed by another scope.

If the parser has a debug mode where it prints its parse tree, you will be able to see this immediately. But in short, you're saying there are zero or more maps, followed by zero or more scopes, followed by zero or more maps, so if there is a scope, followed by a map, you have already passed the scope field, so any more scopes are invalid. If the language used by pyparsing supports "or" you could use:

scope << (scope_header + ZeroOrMore((wire_map | scope)) + scope_footer)
like image 157
Zack Bloom Avatar answered Oct 07 '22 12:10

Zack Bloom


Please choose @ZackBloom's answer as the correct one, he intuited it right off, without even knowing pyparsing's syntax.

Just a few comments/suggestions on your grammar:

With the answer posted above, you can visualize the nesting using pprint and pyparsing's asList() method on ParseResults:

res = scope.parseString(vcd)

from pprint import pprint
pprint(res.asList())

Giving:

[[['$scope', 'module', 'toplevel', '$end'],
  [['$scope', 'module', 'midlevel', '$end'],
   ['$var', 'wire', 'a', '$end'],
   ['$var', 'wire', 'b', '$end'],
   ['$upscope', '$end']],
  ['$var', 'wire', 'c', '$end'],
  ['$var', 'wire', 'd', '$end'],
  ['$var', 'wire', 'e', '$end'],
  [['$scope', 'module', 'extralevel', '$end'],
   ['$var', 'wire', 'f', '$end'],
   ['$var', 'wire', 'g', '$end'],
   ['$upscope', '$end']],
  ['$var', 'wire', 'h', '$end'],
  ['$var', 'wire', 'i', '$end'],
  ['$upscope', '$end']]]

So now you have nicely structured results. But you can clean things up a bit. For one thing, now that you have structure, you don't really need all those $scope, $end, etc. tokens. You can certainly just step over them as you navigate through the parsed results, but you can also have pyparsing just drop them from the parsed output (since the results are now structured, you're not really losing anything). Change you parser definitions to:

SCOPE, VAR, UPSCOPE, END = map(Suppress, 
                                 "$scope $var $upscope $end".split())
MODULE, WIRE = map(Literal, "module wire".split())

scope_header = Group(SCOPE + MODULE + Word(alphas) + END)
wire_map = Group(VAR + WIRE + Word(alphas) + END) 
scope_footer = (UPSCOPE + END) 

(No need to group scope_footer - everything in that expression is suppressed, so Group would just give you an empty list.)

And now you can see more clearly the really important bits:

[[['module', 'toplevel'],
  [['module', 'midlevel'], ['wire', 'a'], ['wire', 'b']],
  ['wire', 'c'],
  ['wire', 'd'],
  ['wire', 'e'],
  [['module', 'extralevel'], ['wire', 'f'], ['wire', 'g']],
  ['wire', 'h'],
  ['wire', 'i']]]

At the risk of too much grouping, I would suggest also Grouping the content of your scope expression, like this:

scope << Group(scope_header + 
               Group(ZeroOrMore((wire_map | scope))) + 
               scope_footer) 

which gives these results:

[[['module', 'toplevel'],
  [[['module', 'midlevel'], [['wire', 'a'], ['wire', 'b']]],
   ['wire', 'c'],
   ['wire', 'd'],
   ['wire', 'e'],
   [['module', 'extralevel'], [['wire', 'f'], ['wire', 'g']]],
   ['wire', 'h'],
   ['wire', 'i']]]]

Now every scope result has 2 predictable elements: the module header, and a list of wires or subscopes. This predictability will make it a lot easier to write the recursive code that will navigate the results:

res = scope.parseString(vcd)
def dumpScope(parsedTokens, indent=''):
    module,contents = parsedTokens
    print indent + '- ' + module[1]
    for item in contents:
        if item[0]=='wire':
            print indent + '  wire: ' + item[1]
        else:
            dumpScope(item, indent+'  ')
dumpScope(res[0])

which comes out looking like:

- toplevel
  - midlevel
    wire: a
    wire: b
  wire: c
  wire: d
  wire: e
  - extralevel
    wire: f
    wire: g
  wire: h
  wire: i

Good first question, welcome to SO and pyparsing!

like image 45
PaulMcG Avatar answered Oct 07 '22 12:10

PaulMcG