Given a code base (say for example a large C or Objective-C project) I would like to analyze the sourcecode files and pick out symbols of interest. They might be class declarations, variable names or types, or method names. Is there a Python module that could help me with this?
The only approach I can see going forward is to use regular expressions to gather these symbols, but I'm thinking this could get very ugly very quickly. I'm also not an expert in compilers or parsers, so something lighter-weight would be prefereable.
thanks for any suggestions.
------ update -----
thanks for all of the suggestions so far, definitely some promising leads. One other avenue that may be possible: what if I were able to compile the project I was trying to analyze. Would the debugging symbols (dsym) make this process any easier? I'm not looking for anything advanced, just a list of classes, with their ivar and method names. At this point, looking into the parsing tools suggested seem like more work than I can afford to invest in this project right now
Regex is definitely not a good way to examine programming language code. I would suggest choosing a parsing module from the links provided below. There are a few tools out there that you could use. They all provide parsing facility. You can always build your stuff on top of that:
pygccxml generates xml description from c++ program files. This might be closer to what you are trying to do:
Also look at this, it generate navigable class tree representing the class structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With