Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy way to parse .h file for comments using Python?

How to parse in easy way a .h file written in C for comments and entity names using Python?

We're suppose for a further writing the content into the word file already developed.

Source comments are formatted using a simple tag-style rules. Comment tags used for an easy distinguishing one entity comment from the other and non-documenting comments. A comment could be in multi-line form. An each comment have stay straight upon the entity definition:

//ENUM My comment bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
//     could be multi-line. Bla bla bla bla bla bla bla bla bla.
enum my_enum
{
    //EITEM My enum item 1.
    //      Just could be multi-line too.
    MY_ENUM_ITEM_1,

    //EITEM My enum item 2
    MY_ENUM_ITEM_2,
};

//STRUCT My struct
struct my_struct {

    //MEMBER struct member 1
    int m_1_;
};

//FUNC my function 1 description.
//     Could be multi-line also.
//INPUT  arg1 - first argument
//RETURN pointer to an allocated my_struct instance.
my_struct* func_1(int arg1);

A code-and-comments tree should come out as a result of this parsing.

How does one make it quickly and without using third-party libraries?

like image 561
Brian Cannard Avatar asked Feb 28 '23 21:02

Brian Cannard


2 Answers

This has already been done. Several times over.

Here is a parser for the C language written in Python. Start with this.

http://wiki.python.org/moin/SeeGramWrap

Other parsers.

http://wiki.python.org/moin/LanguageParsing

http://nedbatchelder.com/text/python-parsers.html

You could probably download any ANSI C Yacc grammar and rework it into PLY format without too much trouble and use that as a jumping-off point.

like image 176
S.Lott Avatar answered Mar 13 '23 02:03

S.Lott


Here's a quick and dirty solution. It won't handle comments in strings, but since this is just for header files that shouldn't be an issue.

S_CODE,S_INLINE,S_MULTLINE = range (3)
f = open (sys.argv[1])
state = S_CODE
comments = ''
i = iter (lambda: f.read (1), '')
while True:
    try:
        c = i.next ()
    except StopIteration:
        break
    if state == S_CODE:
        if c == '/':
            c = i.next ()
            if c == '*':
                state = S_MULTLINE
            elif c == '/':
                state = S_INLINE
    elif state == S_INLINE:
        comments += c
        if c == '\n':
            state == S_CODE
    elif state == S_MULTLINE:
        if c == '*':
            c = i.next ()
            if c == '/':
                comments += '\n'
                state = S_CODE
            else:
                comments += '*%s' % c
        else:
            comments += c
print comments
like image 22
eduffy Avatar answered Mar 13 '23 01:03

eduffy