Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying implicit string literal concatenation

Tags:

python

string

According to guido (and to some other Python programmers), implicit string literal concatenation is considered harmful. Thus, I am trying to identifying logical lines containing such a concatenation.

My first (and only) attempt was using shlex; I thought of splitting a logical line with posix=False, so I'll identify parts encapsulated by quotes, and if these lie next to each other, it will be considered "literal concatenation".

However, this fails on multiline strings, as the following example shows:

shlex.split('""" Some docstring """', posix=False)
# Returns '['""', '" Some docstring "', '""']', which is considered harmful, but it's not

I can tweak this is some weird ad-hoc ways, but I wondered whether you can think of a simple solution for this. My intention is to add it to my already extended pep8 verifier.

like image 838
Bach Avatar asked Feb 04 '14 07:02

Bach


1 Answers

Interesting question, I just had to play with it and because there is no answer I'm posting my solution to the problem:

#!/usr/bin/python

import tokenize
import token
import sys

with open(sys.argv[1], 'rU') as f:
    toks = list(tokenize.generate_tokens(f.readline))
    for i in xrange(len(toks) - 1):
        tok = toks[i]
        # print tok
        tok2 = toks[i + 1]
        if tok[0] == token.STRING and tok[0] == tok2[0]:
            print "implicit concatenation in line " \
                "{} between {} and {}".format(tok[2][0], tok[1], tok2[1])

You can feed the program with itself and the result should be

implicit concatenation in line 14 between "implicit concatenation in line " and "{} between {} and {}"
like image 75
hochl Avatar answered Oct 17 '22 03:10

hochl