Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically simplifying/refactoring Python code (e.g. for loops -> list comprehension)? [closed]

In Python, I really enjoy how concise an implementation can be when using list comprehension. I love to do concise list comprehensions this:

myList = [1, 5, 11, 20, 30, 35] #input data
bigNumbers = [x for x in myList if x > 10]

However, I often encounter more verbose implementations like this:

myList = [1, 5, 11, 20, 30, 35] #input data
bigNumbers = []
for i in xrange(0, len(myList)):
    if myList[i] > 10:
        bigNumbers.append(myList[i])

When a for loop only looks through one data structure (e.g. myList[]), there is usually a straightforward list comprehension statement that is equivalent to the loop.
With this in mind, is there a refactoring tool that converts verbose Python loops into concise list comprehension statements?


Previous StackOverflow questions have asked for advice on transforming loops into list comprehension. But, I have yet to find a question about automatically converting loops into list comprehension expressions.


Motivation: There are numerous ways to answer the question "what does it mean for code to be clean?" Personally, I find that making code concise and getting rid of some of the fluff tends to make code cleaner and more readable. Naturally there's a line in the sand between "concise code" and "incomprehensible one-liners." Still, I often find it satisfying to write and work with concise code.

like image 741
solvingPuzzles Avatar asked Jan 25 '13 07:01

solvingPuzzles


1 Answers

2to3 is a refactoring tool that can perform arbitrary refactorings, as long as you can specify them with a syntactical pattern. The pattern you might want to look for is this

VARIABLE1 = []
for VARIABLE2 in EXPRESSION1:
    if EXPRESSION2:
        VARIABLE1.append(EXPRESSION3)

This can be refactored safely to

VARIABLE1 = [EXPRESSION3 for VARIABLE2 in EXPRESSION1 if EXPRESSION2]

In your specific example, this would give

bigNumbers = [myList[i] for i in xrange(0, len(myList)) if myList[i] > 10]

Then, you can have another refactoring that replaces xrange(0, N) with xrange(N), and another one that replaces

[VARIABLE1[VARIABLE2] for VARIABLE2 in xrange(len(VARIABLE1)) if EXPRESSION1]

with

[VARIABLE3 for VARIABLE3 in VARIABLE1 if EXPRESSION1PRIME]

There are several problems with this refactoring:

  • EXPRESSION1PRIME must be EXPRESSION1 with all occurrences of VARIABLE1[VARIABLE2] replaced by VARIABLE3. This is possible with 2to3, but requires explicit code to do the traversal and replacement.
  • EXPRESSION1PRIME then must not contain no further occurrences of VARIABLE1. This can also be checked with explicit code.
  • One needs to come up with a name for VARIABLE3. You have chosen x; there is no reasonable way to have this done automatically. You could chose to recycle VARIABLE1 (i.e. i) for that, but that may be confusing as it suggests that i is still an index. It might work to pick a synthetic name, such as VARIABLE1_VARIABLE2 (i.e. myList_i), and check whether that's not used otherwise.
  • One needs to be sure that VARIABLE1[VARIABLE2] yields the same as you get when using iter(VARIABLE1). It's not possible to do this automatically.

If you want to learn how to write 2to3 fixers, take a look at Lennart Regebro's book.

like image 95
Martin v. Löwis Avatar answered Sep 25 '22 15:09

Martin v. Löwis