I'm using pyparsing to parse vcd (value change dump) files. Essentially, I want to read in the files, parse it into an internal dictionary, and manipulate the values.
Without going into details on the structure, my problem occurs with identifying nested categories.
In vcd files, you have 'scopes' which include wires and possibly some deeper (nested) scopes. Think of them like levels.
So in my file, I have:
$scope module toplevel $end
$scope module midlevel $end
$var wire a $end
$var wire b $end
$upscope $end
$var wire c $end
$var wire d $end
$var wire e $end
$scope module extralevel $end
$var wire f $end
$var wire g $end
$upscope $end
$var wire h $end
$var wire i $end
$upscope $end
So 'toplevel' encompasses everything (a - i), 'midlevel' has (a - b), 'extralevel' has (f - g), etc.
Here is my code (snippet) for parsing this section:
scope_header = Group(Literal('$scope') + Word(alphas) + Word(alphas) + \
Literal('$end'))
wire_map = Group(Literal('$var') + Literal('wire') + Word(alphas) + \
Literal('$end'))
scope_footer = Group(Literal('$upscope') + Literal('$end'))
scope = Forward()
scope << (scope_header + ZeroOrMore(wire_map) + ZeroOrMore(scope) + \
ZeroOrMore(wire_map) + scope_footer)
Now, what I thought happens is that as it hits each scope, it would keep track of each 'level' and I would end up with a structure containing nested scopes. However, it errors out on
$scope module extralevel $end
saying it expects '$upscope'.
So I know I'm not using the recursion correctly. Can someone help me out? Let me know if I need to provide more info.
Thanks!!!!
According to your definition, a scope cannot contain another scope, followed by some maps, followed by another scope.
If the parser has a debug mode where it prints its parse tree, you will be able to see this immediately. But in short, you're saying there are zero or more maps, followed by zero or more scopes, followed by zero or more maps, so if there is a scope, followed by a map, you have already passed the scope field, so any more scopes are invalid. If the language used by pyparsing supports "or" you could use:
scope << (scope_header + ZeroOrMore((wire_map | scope)) + scope_footer)
Please choose @ZackBloom's answer as the correct one, he intuited it right off, without even knowing pyparsing's syntax.
Just a few comments/suggestions on your grammar:
With the answer posted above, you can visualize the nesting using pprint and pyparsing's asList()
method on ParseResults:
res = scope.parseString(vcd)
from pprint import pprint
pprint(res.asList())
Giving:
[[['$scope', 'module', 'toplevel', '$end'],
[['$scope', 'module', 'midlevel', '$end'],
['$var', 'wire', 'a', '$end'],
['$var', 'wire', 'b', '$end'],
['$upscope', '$end']],
['$var', 'wire', 'c', '$end'],
['$var', 'wire', 'd', '$end'],
['$var', 'wire', 'e', '$end'],
[['$scope', 'module', 'extralevel', '$end'],
['$var', 'wire', 'f', '$end'],
['$var', 'wire', 'g', '$end'],
['$upscope', '$end']],
['$var', 'wire', 'h', '$end'],
['$var', 'wire', 'i', '$end'],
['$upscope', '$end']]]
So now you have nicely structured results. But you can clean things up a bit. For one thing, now that you have structure, you don't really need all those $scope
, $end
, etc. tokens. You can certainly just step over them as you navigate through the parsed results, but you can also have pyparsing just drop them from the parsed output (since the results are now structured, you're not really losing anything). Change you parser definitions to:
SCOPE, VAR, UPSCOPE, END = map(Suppress,
"$scope $var $upscope $end".split())
MODULE, WIRE = map(Literal, "module wire".split())
scope_header = Group(SCOPE + MODULE + Word(alphas) + END)
wire_map = Group(VAR + WIRE + Word(alphas) + END)
scope_footer = (UPSCOPE + END)
(No need to group scope_footer
- everything in that expression is suppressed, so Group
would just give you an empty list.)
And now you can see more clearly the really important bits:
[[['module', 'toplevel'],
[['module', 'midlevel'], ['wire', 'a'], ['wire', 'b']],
['wire', 'c'],
['wire', 'd'],
['wire', 'e'],
[['module', 'extralevel'], ['wire', 'f'], ['wire', 'g']],
['wire', 'h'],
['wire', 'i']]]
At the risk of too much grouping, I would suggest also Group
ing the content of your scope
expression, like this:
scope << Group(scope_header +
Group(ZeroOrMore((wire_map | scope))) +
scope_footer)
which gives these results:
[[['module', 'toplevel'],
[[['module', 'midlevel'], [['wire', 'a'], ['wire', 'b']]],
['wire', 'c'],
['wire', 'd'],
['wire', 'e'],
[['module', 'extralevel'], [['wire', 'f'], ['wire', 'g']]],
['wire', 'h'],
['wire', 'i']]]]
Now every scope result has 2 predictable elements: the module header, and a list of wires or subscopes. This predictability will make it a lot easier to write the recursive code that will navigate the results:
res = scope.parseString(vcd)
def dumpScope(parsedTokens, indent=''):
module,contents = parsedTokens
print indent + '- ' + module[1]
for item in contents:
if item[0]=='wire':
print indent + ' wire: ' + item[1]
else:
dumpScope(item, indent+' ')
dumpScope(res[0])
which comes out looking like:
- toplevel
- midlevel
wire: a
wire: b
wire: c
wire: d
wire: e
- extralevel
wire: f
wire: g
wire: h
wire: i
Good first question, welcome to SO and pyparsing!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With