Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List comprehension loop ordering depends on nesting [closed]

I'm looking for alternatives to using comprehensions for nested data structures or ways to get comfortable with nested list comprehensions if possible.

Without comprehensions generating a list of items using a nested loop works like this:

combos = []
for a in iterable:
    for b in valid_posibilities(a):
        combos.append((a,b))

turning this into a comprehension retains the order of the loops which makes using multiple lines nice:

combos = [
    (a,b)
    for a in iterable
        for b in valid_posibilities(a)
    ]

However this creates a single list. If I want some code to produce a nested data structure then I would use something like this:

# same as above but instead of list of (a,b) tuples,
# I want a dictionary of {a:[b]} structure
combos_map = {}
for a in iterable:
    options = []
    for b in valid_posibilities(a):
        options.append(b)
    combos_map[a] = options

(the following snippet has the equivalent code using plain lists for those who haven't seen dictionary comprehension before and the first time seeing it being nested in a weird way is hard to follow)

# for people unfamilar with dictionary comprehension
# this is the equivelent nesting structure
combos = []
for a in iterable:
    options = []
    for b in valid_posibilities(a):
        options.append(b)
    combos.append(options)

######## or equivelently
combos = [
      [b
        for b in valid_posibilities(a)
      ]
    for a in iterable
    ]

Now converting it to a comprehension we get this:

combos_map = {
    a:[b
        for b in valid_posibilities(a)
      ]
    for a in iterable
    }

What the heck? The order of the loops switched! This is because the inner loop has to be put inside the inner list. If it was just always reversed when you want a nested data structure I'd be fine but conditions or non-nesting loops make it worse:

# for a list of files produce a mapping of {filename:(set of all words)}
# only in text files.
file_to_words_map = {}
for filename in list_of_files:
    if filename.endswith(".txt"):
        word_set = set()
        for line in open(filename):
            for word in line.split():
                word_set.add(word)
        file_to_words_map[filename] = word_set
        

### or using comprehension we get this lovely mess:

file_to_words_map = {
    filename: { word
            for line in open(filename)
               for word in line.split()
        }
    for filename in list_of_files
        if filename.endswith(".txt")
    }

I teach python to beginners and on the occasion that someone wants to generate a nested data structure with comprehensions and I tell them 'it isn't worth it' I'd like to be able to send them here as a nicer explanation for why.

So for the people I will send here I'm looking for is one of the following:

  1. Is there another way to refactor these kinds of loops that make the code easier to follow instead of just directly sticking them in comprehensions?

  2. is there a way to interpret and construct these nested loops in an intuitive way? At some point someone who is not familiar with python comprehensions will stumble across some code like the ones shown here and hopefully will end up here looking for some insight.

like image 450
Tadhg McDonald-Jensen Avatar asked Nov 09 '20 00:11

Tadhg McDonald-Jensen


People also ask

Are list comprehensions ordered?

Yes, the list comprehension preserves the order of the original iterable (if there is one). If the original iterable is ordered (list, tuple, file, etc.), that's the order you'll get in the result. If your iterable is unordered (set, dict, etc.), there are no guarantees about the order of the items.

Can list comprehension be nested?

As it turns out, you can nest list comprehensions within another list comprehension to further reduce your code and make it easier to read still. As a matter of fact, there's no limit to the number of comprehensions you can nest within each other, which makes it possible to write very complex code in a single line.

Are list comprehensions loops?

List comprehensions are also more declarative than loops, which means they're easier to read and understand. Loops require you to focus on how the list is created. You have to manually create an empty list, loop over the elements, and add each of them to the end of the list.

What is the difference between list comprehension and for loop?

List comprehensions are the right tool to create lists — it is nevertheless better to use list(range()). For loops are the right tool to perform computations or run functions. In any case, avoid using for loops and list comprehensions altogether: use array computations instead.


2 Answers

Maybe the problem is that you are over-using list comprehension. I love it too, but, what purpose does it serve when the code becomes more convoluted than the loop?

If you want to stay with the list-comprehension-over-everything approach, you could factor away the inner loops to helper functions. This way it is much easier to digest:

def collect_words(file):
    ...

file_to_words_map = {
    filename: collect_words(open(filename))
    for filename in list_of_files if filename.endswith(".txt")
}

Btw., I don't think breaking such statements in multiple lines necessarily makes them clearer (instead, your urge to do so is quite telling). In the above example I intentionally rejoined the for and if part.

like image 178
ypnos Avatar answered Oct 05 '22 19:10

ypnos


One approach is to use generators! Because you end up writing your code in a 'statement' basis instead of 'expression' basis it ends up being much more expandable.

def words_of_file(filename):
    """opens the file specified and generates all words present."""
    with open(filename) as file:
        for line in file:
            for word in line.split():
                yield word
                
def get_words_in_files(list_of_files):
    """generates tuples of form (filename, set of words) for all text files in the given list"""
    for filename in list_of_files:
        # skip non text files
        if not filename.endswith(".txt"):
            continue # can use continue to skip instead of nesting everything
        
        words_of_file = set(words_of_file(filename))
        # dict constructor takes (key,value) tuples.
        yield (filename, words_of_file)

file_to_words_map = dict(get_words_of_files(["a.txt", "b.txt", "image.png"]))

Using generators has a number of benefits:

  • we could use statements like with and continue and variable assignment and debugging print statements. All because we are in a block scope instead of an expression scope.
  • words_of_file just generates the words, it doesn't dictate that they must be put into a set. Some other code may choose to iterate over the words directly or pass it to the list constructor instead. Maybe a collections.Counter would be useful too! The point is that the generator lets the caller decide how to use the sequence.

This doesn't stop you from using comprehensions or other shortcuts either, if you want to yield all the elements of an iterator you can just yield from that iterator so you might end up with some code like this:

def words_of_file(filename):
    """opens the file specified and generates all words present."""
    with open(filename) as file:
        for line in file:
            # produces all words of the line
            yield from line.split()
            
file_to_words_map = {filename:set(words_of_file(filename))
                      for filename in list_of_files
                         if filename.endswith(".txt")
                     }

Different people have different opinions, I know my favourite is the generator only option because I am a very large fan of generators. I'm sure some people like the one liner solution of the nested comprehension, but this last version that uses simple comprehension and helper functions is probably what most people would be most comfortable with.

like image 45
Tadhg McDonald-Jensen Avatar answered Oct 05 '22 18:10

Tadhg McDonald-Jensen