Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse Java Source Files with Python [closed]

I have a bunch of Java source files. I need to write a python script that goes through the source files and identifies all string literals and their location.

The problem is the strings could be in a couple of different forms such as:

  1. String literal - "Hello World"
  2. Combination of literals - "Hello" + "World"

I have come up with a couple of ideas to accomplish this:

  1. Go line by line through the source files looking for " and using that to identify the location of a string
  2. Use a regular expression

Do you have any comments on the ways I suggested on doing this or another method which I have not thought about?

In case your wondering, were doing internationalization on our code base. That's why I am trying to automate this process.

like image 742
user489041 Avatar asked Dec 08 '25 23:12

user489041


2 Answers

Using re module is the quickest solution.

you can use re.finditer() which returns each matched regex with the content and position

>>> for m in re.finditer(r"\w+ly", text):
...     print '%02d-%02d: %s' % (m.start(), m.end(), m.group(0))
like image 136
igni Avatar answered Dec 11 '25 12:12

igni


Another option is PLY, which is a pure-python lex / yacc. It was written by David Beazley... he has some slides that demonstrate the functionality. This would require a BNF grammar to quantify the syntax you are parsing. I'm not sure if you want to go that far.

If you don't want to use BNF, pyparsing is another choice.

like image 33
Mike Pennington Avatar answered Dec 11 '25 13:12

Mike Pennington



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!