Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyParsing Optional() hanging

When using only Optional or ZeroOrMore, pyparsing seems to enter in an infinite loop. The following code work but the part "# Should work with pp.Optional()" should indeed be Optional and not OneOrMore. Should I put some sort of stopOn in this case?

The dictionary is shown below:

In which [expr] means Optional expr, and [expr]... means optional expr that can repeat so ZeroOrMore:

[PINS numPins ;
  [ – pinName + NET netName
  [+ SPECIAL]
  [+ DIRECTION {INPUT | OUTPUT | INOUT | FEEDTHRU}]
  [+ NETEXPR "netExprPropName defaultNetName"]
  [+ SUPPLYSENSITIVITY powerPinName]
  [+ GROUNDSENSITIVITY groundPinName]
  [+ USE {SIGNAL | POWER | GROUND | CLOCK | TIEOFF | ANALOG | SCAN | RESET}]
  [+ ANTENNAPINPARTIALMETALAREA value [LAYER layerName]] ...
  [+ ANTENNAPINPARTIALMETALSIDEAREA value [LAYER layerName]] ...
  [+ ANTENNAPINPARTIALCUTAREA value [LAYER layerName]] ...
  [+ ANTENNAPINDIFFAREA value [LAYER layerName]] ...
  [+ ANTENNAMODEL {OXIDE1 | OXIDE2 | OXIDE3 | OXIDE4}] ...
  [+ ANTENNAPINGATEAREA value [LAYER layerName]] ...
  [+ ANTENNAPINMAXAREACAR value LAYER layerName] ...
  [+ ANTENNAPINMAXSIDEAREACAR value LAYER layerName] ...
  [+ ANTENNAPINMAXCUTCAR value LAYER layerName] ...
  [ # The code shows only this section
    [+ PORT]
    [+ LAYER layerName
      [MASK maskNum]
      [SPACING minSpacing | DESIGNRULEWIDTH effectiveWidth] pt pt
    |+ POLYGON layerName
      [MASK maskNum]
      [SPACING minSpacing | DESIGNRULEWIDTH effectiveWidth] pt pt pt ...
    |+ VIA viaName
      [MASK viaMaskNum] pt
    ] ...
    [+ COVER pt orient | FIXED pt orient | PLACED pt orient]  # This must be Optional
    ]...
; ] ...
END PINS]

And this is the parser (It shows only the PLACEMENT_PINS part).

# PLACEMENT_PINS
    PORT = (ws_pin
            + pp.Keyword('PORT')('PORT')
           )

    MASK = pp.Group(pp.Keyword('MASK')
                    + number('maskNum')
                   ).setResultsName('MASK')

    SPACING = pp.Group(pp.Keyword('SPACING')
                       + number('minSpacing')
                      ).setResultsName('SPACING')

    DESIGNRULEWIDTH = pp.Group(pp.Keyword('DESIGNRULEWIDTH')
                               + number('effectiveWidth')
                              ).setResultsName('DESIGNRULEWIDTH')

    LAYER = pp.Group(ws_pin
                     + pp.Suppress(pp.Keyword('LAYER')) + identifier('layerName')
                     + pp.Optional(MASK)
                     + pp.Optional(SPACING | DESIGNRULEWIDTH)
                     + pp.OneOrMore(pp.Group(pt))('coord')
                    ).setResultsName('LAYER')

    POLYGON =  pp.Group(ws_pin
                        + pp.Suppress(pp.Keyword('POLYGON')) + identifier('layerName')
                        + pp.Optional(MASK)
                        + pp.Optional(SPACING | DESIGNRULEWIDTH)
                        + pp.OneOrMore(pp.Group(pt))('coord')
                       ).setResultsName('POLYGON')

    VIA =  pp.Group(ws_pin
                    + pp.Suppress(pp.Keyword('VIA')) + identifier('viaName')
                    + pp.Optional(MASK)
                    + pp.Group(pt)('coord')
                   ).setResultsName('VIA')

    COVER = pp.Group(ws_pin
                     + pp.Keyword('COVER')
                     + pp.Group(pt)('coord')
                     + ORIENT('orient')
                    ).setResultsName('COVER')
    FIXED = pp.Group(ws_pin
                     + pp.Keyword('FIXED')
                     + pp.Group(pt)('coord')
                     + ORIENT('orient')
                    ).setResultsName('FIXED')
    PLACED = pp.Group(ws_pin
                      + pp.Keyword('PLACED')
                      + pp.Group(pt)('coord')
                      + ORIENT('orient')
                     ).setResultsName('PLACED')

    PLACEMENT_PINS = pp.Group(pp.Optional(PORT)
                              + pp.ZeroOrMore(LAYER | POLYGON | VIA)
                              + pp.OneOrMore(COVER | FIXED | PLACED)  # Should work with pp.Optional(), but it doesn't.
                             )

    pin = pp.Group(pp.Suppress(begin_pin)
                   + pinName
                   + pp.Optional(SPECIAL)
                   + pp.Optional(DIRECTION)
                   + pp.Optional(NETEXPR)
                   + pp.Optional(SUPPLYSENSITIVITY)
                   + pp.Optional(GROUNDSENSITIVITY)
                   + pp.Optional(USE)
                   + pp.ZeroOrMore(ANTENNAPINPARTIALMETALAREA)
                   + pp.ZeroOrMore(ANTENNAPINPARTIALMETALSIDEAREA)
                   + pp.ZeroOrMore(ANTENNAPINPARTIALCUTAREA)
                   + pp.ZeroOrMore(ANTENNAPINDIFFAREA)
                   + pp.ZeroOrMore(ANTENNAMODEL)
                   + pp.ZeroOrMore(ANTENNAPINGATEAREA)
                   + pp.ZeroOrMore(ANTENNAPINMAXAREACAR)
                   + pp.ZeroOrMore(ANTENNAPINMAXSIDEAREACAR)
                   + pp.ZeroOrMore(ANTENNAPINMAXCUTCAR)
                   + pp.ZeroOrMore(PLACEMENT_PINS).setResultsName('PLACEMENT')
                   + pp.Suppress(linebreak)
                  ).setResultsName('pin', listAllMatches=True)

    pins = pp.Group(pp.Suppress(pins_id) + number('numPins') + pp.Suppress(linebreak)
                    + pp.ZeroOrMore(pin)
                    + pp.Suppress(end_pins_id)
                   ).setResultsName('PINS')

And here is an example of the text to be parsed:

PINS 165 ;
- clk + NET clk + DIRECTION INPUT + USE SIGNAL
  + LAYER M2 ( -25 0 ) ( 25 220 )
  + PLACED ( 0 81500 ) E ;
- rst + NET rst + DIRECTION INPUT + USE SIGNAL
  + LAYER M5 ( -25 0 ) ( 25 220 )
  + PLACED ( 96300 140000 ) S ;
- im_rsc_CSN + NET im_rsc_CSN + DIRECTION OUTPUT + USE SIGNAL
  + LAYER M3 ( -25 0 ) ( 25 220 )
  + PLACED ( 80300 140000 ) S ;
END PINS

In this example, if the lines "+ PLACED" are removed the parser doesn't work since it's "pp.OneOrMore(COVER | FIXED | PLACED)" and not "pp.Optional(COVER | FIXED | PLACED)".

Other section to be parsed is UNITS. All expressions are optional, i.e. the file can contain "TIME NANOSECONDS 1000" or not etc.

[UNITS
    [TIME NANOSECONDS convertFactor ;]
    [CAPACITANCE PICOFARADS convertFactor ;]
    [RESISTANCE OHMS convertFactor ;]
    [POWER MILLIWATTS convertFactor ;]
    [CURRENT MILLIAMPS convertFactor ;]
    [VOLTAGE VOLTS convertFactor ;]
    [DATABASE MICRONS LEFconvertFactor ;]
    [FREQUENCY MEGAHERTZ convertFactor ;]
END UNITS]

Here is the parser that hangs because all expressions are optional:

# DATABASE_MICRONS
DATABASE_MICRONS = (pp.Keyword('DATABASE MICRONS')
                    + number('convertFactor')
                    + linebreak
                   )
unit = pp.Group(pp.Optional(TIME_NANOSECONDS)
                        + pp.Optional(CAPACITANCE_PICOFARADS)
                        + pp.Optional(RESISTANCE_OHMS)
                        + pp.Optional(POWER_MILLIWATTS)
                        + pp.Optional(CURRENT_MILLIAMPS)
                        + pp.Optional(VOLTAGE_VOLTS)
                        + pp.Optional(DATABASE_MICRONS)
                        + pp.Optional(FREQUENCY_MEGAHERTZ)
                       ).setResultsName('unit', listAllMatches=True)

units = pp.Group(pp.Suppress(units_id)
                 + pp.OneOrMore(unit)
                 + pp.Suppress(end_units_id)
                ).setResultsName('UNITS')

However, if I replace one of the lines, for example "+ pp.Optional(DATABASE_MICRONS)" by "+ pp.OneOrMore(DATABASE_MICRONS)" (then the file must now contain this expression) then it will work.

Example of UNITS section:

UNITS
 DATABASE MICRONS 1000 ;
END UNITS

So, how to deal with grammars in which all expressions are optional?

like image 233
Raphael Avatar asked Oct 16 '22 09:10

Raphael


1 Answers

If all the elements in PLACEMENT_PINS are optional, then it will match the empty string. Matching ZeroOrMore of an expression that will match the empty string will loop forever.

Are all the ZeroOrMore's there because you don't know what the order will be? If so, consider using the '&' operator instead of '+'. a_expr & b_expr & c_expr will match the three expressions but in any order.

EDIT: I understand that they are all optional, but because you have lumped them together into their own unit expression with everything Optional (and so matchable to the empty string) and are then OneOrMoreing them, this is another endless loop.

When you say "they are all optional", I understand that they are all optional from the standpoint of defining a UNITS section. But the OneOrMore in units is already taking care of repetition. If an empty UNITS section is valid, then use ZeroOrMore.

These look like 'unit phrase's to me, that each is some multi-word qualifier on units, any or all of which might be present, in any number.

Instead of adding them all as Optionals, define them as a single MatchFirst - "a unit phrase is one of the specific phrases". The outer OneOrMore will take care of the repetition and optionalizing:

unit_phrase = pp.Group(TIME_NANOSECONDS
                        | CAPACITANCE_PICOFARADS
                        | RESISTANCE_OHMS
                        | POWER_MILLIWATTS
                        | CURRENT_MILLIAMPS
                        | VOLTAGE_VOLTS
                        | DATABASE_MICRONS
                        | FREQUENCY_MEGAHERTZ)

units = pp.Group(pp.Suppress(units_id)
                 + pp.OneOrMore(unit_phrase)('unit')
                 + pp.Suppress(end_units_id)
                ).setResultsName('UNITS')

If in fact these can all be optional but must occur only once, then defining an Each of Optionals is what you want, with no repetition:

unit = pp.Group(pp.Optional(TIME_NANOSECONDS)
                        & pp.Optional(CAPACITANCE_PICOFARADS)
                        & pp.Optional(RESISTANCE_OHMS)
                        & pp.Optional(POWER_MILLIWATTS)
                        & pp.Optional(CURRENT_MILLIAMPS)
                        & pp.Optional(VOLTAGE_VOLTS)
                        & pp.Optional(DATABASE_MICRONS)
                        & pp.Optional(FREQUENCY_MEGAHERTZ)
                       )

units = pp.Group(pp.Suppress(units_id)
                 + unit.setResultsName('unit')  # <-- no OneOrMore repetition now, let Each do the orderless matching
                 + pp.Suppress(end_units_id)
                ).setResultsName('UNITS')
like image 156
PaulMcG Avatar answered Oct 20 '22 09:10

PaulMcG