Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing structured text file in python

I need to parse text files similar to the one below with Python, build an hierarchical object structure of the data and then process it. This is very similar to what we can do with xml.etree.ElementTree and other XML parsers.

The syntax of these files is however not XML and I'm wondering what is the best way to implement such a parser: if trying to subclass one XML parser (which one?) and customize its behavior for tag recognition, write a custom parser, etc.

{NETLIST topblock
{VERSION 2 0 0}

{CELL topblock
    {PORT gearshift_h vpsf vphreg pwron_h vinp vref_out vcntrl_out gd meas_vref 
      vb vout meas_vcntrl reset_h vinm }
    {INST XI21/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_h=DRN vpsf=GATE vpsf=BULK }}
    {INST XI21/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gs_h=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI20/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_hn=DRN vpsf=GATE vpsf=BULK }}
    {INST XI20/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gs_hn=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI19/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC net514=DRN vpsf=GATE vpsf=BULK }}
    {INST XI19/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN net514=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI21/MN0=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gd=SRC gs_h=DRN gs_hn=GATE gd=BULK }}
    {INST XI21/MP0=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_h=DRN gs_hn=GATE vpsf=BULK }}
    {INST XI20/MN0=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
...
}
}
like image 820
jserras Avatar asked Feb 13 '23 18:02

jserras


1 Answers

What the others said in the comments: use an existing parser. If none exists, roll your own, but use a parser library. Here e.g. with Parcon:

from pprint import pprint
from parcon import (Forward, SignificantLiteral, Word, alphanum_chars, Exact,
                    ZeroOrMore, CharNotIn, concat, OneOrMore)

block = Forward()
hyphen = SignificantLiteral('"')
word = Word(alphanum_chars + '/_.)')
value = word | Exact(hyphen + ZeroOrMore(CharNotIn('"')) + hyphen)[concat]
pair = word + '=' + value
flag = word
attribute = pair | flag | block
head = word
body = ZeroOrMore(attribute)
block << '{' + head + body  + '}'
blocks = OneOrMore(block)

with open('<your file name>.txt') as infile:
    pprint(blocks.parse_string(infile.read()))

Result:

[('NETLIST',
  ['topblock',
   ('VERSION', ['2', '0', '0']),
   ('CELL',
    ['topblock',
     ('PORT',
      ['gearshift_h',
       'vpsf',
       'vphreg',
       'pwron_h',
       'vinp',
       'vref_out',
       'vcntrl_out',
       'gd',
       'meas_vref',
       'vb',
       'vout',
       'meas_vcntrl',
       'reset_h',
       'vinm']),
     ('INST',
      [('XI21/Mdummy1', 'pch_18_mac'),
       ('TYPE', ['MOS']),
       ('PROP',
        [('n', '"sctg_inv1x/pch_18_mac"'),
         ('Length', '0.152'),
         ('NFIN', '8')]),
       ('PIN',
        [('vpsf', 'SRC'),
         ('gs_h', 'DRN'),
         ('vpsf', 'GATE'),
         ('vpsf', 'BULK')])]),
     ('INST',
        ...
like image 97
pillmuncher Avatar answered Feb 23 '23 16:02

pillmuncher