Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing an existing config file

I have a config file that is in the following form:

protocol sample_thread {
    { AUTOSTART 0 }
    { BITMAP thread.gif }
    { COORDS {0 0} }
    { DATAFORMAT {
        { TYPE hl7 }
        { PREPROCS {
            { ARGS {{}} }
            { PROCS sample_proc }
        } }
    } } 
}

The real file may not have these exact fields, and I'd rather not have to describe the the structure of the data is to the parser before it parses.

I've looked for other configuration file parsers, but none that I've found seem to be able to accept a file of this syntax.

I'm looking for a module that can parse a file like this, any suggestions?

If anyone is curious, the file in question was generated by Quovadx Cloverleaf.

like image 212
Benjamin Rubin Avatar asked Dec 09 '22 20:12

Benjamin Rubin


1 Answers

pyparsing is pretty handy for quick and simple parsing like this. A bare minimum would be something like:

import pyparsing
string = pyparsing.CharsNotIn("{} \t\r\n")
group = pyparsing.Forward()
group << pyparsing.Group(pyparsing.Literal("{").suppress() + 
                         pyparsing.ZeroOrMore(group) + 
                         pyparsing.Literal("}").suppress()) 
        | string

toplevel = pyparsing.OneOrMore(group)

The use it as:

>>> toplevel.parseString(text)
['protocol', 'sample_thread', [['AUTOSTART', '0'], ['BITMAP', 'thread.gif'], 
['COORDS', ['0', '0']], ['DATAFORMAT', [['TYPE', 'hl7'], ['PREPROCS', 
[['ARGS', [[]]], ['PROCS', 'sample_proc']]]]]]]

From there you can get more sophisticated as you want (parse numbers seperately from strings, look for specific field names etc). The above is pretty general, just looking for strings (defined as any non-whitespace character except "{" and "}") and {} delimited lists of strings.

like image 190
Brian Avatar answered Dec 29 '22 12:12

Brian