I would like to list all strings within my large python project.
Imagine the different possibilities to create a string in python:
mystring = "hello world"
mystring = ("hello "
"world")
mystring = "hello " \
"world"
I need a tool that outputs "filename, linenumber, string" for each string in my project. Strings that are spread over multiple lines using "\" or "('')" should be shown in a single line.
Any ideas how this could be done?
You can use any : a_string = "A string is more than its parts!" matches = ["more", "wholesome", "milk"] if any(x in a_string for x in matches): Similarly to check if all the strings from the list are found, use all instead of any . any() takes an iterable.
unwind's suggestion of using the ast module in 2.6 is a good one. (There's also the undocumented _ast module in 2.5.) Here's example code for that
code = """a = 'blah'
b = '''multi
line
string'''
c = u"spam"
"""
import ast
root = ast.parse(code)
class ShowStrings(ast.NodeVisitor):
def visit_Str(self, node):
print "string at", node.lineno, node.col_offset, repr(node.s)
show_strings = ShowStrings()
show_strings.visit(root)
The problem is multiline strings. If you run the above you'll get.
string at 1 4 'blah'
string at 4 -1 'multi\nline\nstring'
string at 5 4 u'spam'
You see that it doesn't report the start of the multiline string, only the end. There's no good solution for that using the builtin Python tools.
Another option is that you can use my 'python4ply' module. This is a grammar definition for Python for PLY, which is a parser generator. Here's how you might use it:
import compiler
import compiler.visitor
# from python4ply; requires the ply parser generator
import python_yacc
code = """a = 'blah'
b = '''multi
line
string'''
c = u"spam"
d = 1
"""
tree = python_yacc.parse(code, "<string>")
#print tree
class ShowStrings(compiler.visitor.ASTVisitor):
def visitConst(self, node):
if isinstance(node.value, basestring):
print "string at", node.lineno, repr(node.value)
visitor = ShowStrings()
compiler.walk(tree, visitor)
The output from this is
string at 1 'blah'
string at 2 'multi\nline\nstring'
string at 5 u'spam'
There's no support for column information. (There is some mostly complete commented out code to support that, but it's not fully tested.) Then again, I see you don't need it. It also means working with Python's 'compiler' module, which is clumsier than the AST module.
Still, with a 30-40 lines of code you should have exactly what you want.
Python's included tokenize
module will also do the trick.
from __future__ import with_statement
import sys
import tokenize
for filename in sys.argv[1:]:
with open(filename) as f:
for toktype, tokstr, (lineno, _), _, _ in tokenize.generate_tokens(f.readline):
if toktype == tokenize.STRING:
strrepr = repr(eval(tokstr))
print filename, lineno, strrepr
If you can do this in Python, I'd suggest starting by looking at the ast (Abstract Syntax Tree) module, and going from there.
Are you asking about the I18N utilities in Python?
http://docs.python.org/library/gettext.html#internationalizing-your-programs-and-modules
There's a utility called po-utils (formerly xpot) that can help with this.
http://po-utils.progiciels-bpi.ca/README.html
You may also consider to parse your code with pygments.
I don't know the other solution, but it sure is very simple to use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With