I'm looking for alternatives to using comprehensions for nested data structures or ways to get comfortable with nested list comprehensions if possible.
Without comprehensions generating a list of items using a nested loop works like this:
combos = []
for a in iterable:
for b in valid_posibilities(a):
combos.append((a,b))
turning this into a comprehension retains the order of the loops which makes using multiple lines nice:
combos = [
(a,b)
for a in iterable
for b in valid_posibilities(a)
]
However this creates a single list. If I want some code to produce a nested data structure then I would use something like this:
# same as above but instead of list of (a,b) tuples,
# I want a dictionary of {a:[b]} structure
combos_map = {}
for a in iterable:
options = []
for b in valid_posibilities(a):
options.append(b)
combos_map[a] = options
(the following snippet has the equivalent code using plain lists for those who haven't seen dictionary comprehension before and the first time seeing it being nested in a weird way is hard to follow)
# for people unfamilar with dictionary comprehension
# this is the equivelent nesting structure
combos = []
for a in iterable:
options = []
for b in valid_posibilities(a):
options.append(b)
combos.append(options)
######## or equivelently
combos = [
[b
for b in valid_posibilities(a)
]
for a in iterable
]
Now converting it to a comprehension we get this:
combos_map = {
a:[b
for b in valid_posibilities(a)
]
for a in iterable
}
What the heck? The order of the loops switched! This is because the inner loop has to be put inside the inner list. If it was just always reversed when you want a nested data structure I'd be fine but conditions or non-nesting loops make it worse:
# for a list of files produce a mapping of {filename:(set of all words)}
# only in text files.
file_to_words_map = {}
for filename in list_of_files:
if filename.endswith(".txt"):
word_set = set()
for line in open(filename):
for word in line.split():
word_set.add(word)
file_to_words_map[filename] = word_set
### or using comprehension we get this lovely mess:
file_to_words_map = {
filename: { word
for line in open(filename)
for word in line.split()
}
for filename in list_of_files
if filename.endswith(".txt")
}
I teach python to beginners and on the occasion that someone wants to generate a nested data structure with comprehensions and I tell them 'it isn't worth it' I'd like to be able to send them here as a nicer explanation for why.
So for the people I will send here I'm looking for is one of the following:
Is there another way to refactor these kinds of loops that make the code easier to follow instead of just directly sticking them in comprehensions?
is there a way to interpret and construct these nested loops in an intuitive way? At some point someone who is not familiar with python comprehensions will stumble across some code like the ones shown here and hopefully will end up here looking for some insight.
Yes, the list comprehension preserves the order of the original iterable (if there is one). If the original iterable is ordered (list, tuple, file, etc.), that's the order you'll get in the result. If your iterable is unordered (set, dict, etc.), there are no guarantees about the order of the items.
As it turns out, you can nest list comprehensions within another list comprehension to further reduce your code and make it easier to read still. As a matter of fact, there's no limit to the number of comprehensions you can nest within each other, which makes it possible to write very complex code in a single line.
List comprehensions are also more declarative than loops, which means they're easier to read and understand. Loops require you to focus on how the list is created. You have to manually create an empty list, loop over the elements, and add each of them to the end of the list.
List comprehensions are the right tool to create lists — it is nevertheless better to use list(range()). For loops are the right tool to perform computations or run functions. In any case, avoid using for loops and list comprehensions altogether: use array computations instead.
Maybe the problem is that you are over-using list comprehension. I love it too, but, what purpose does it serve when the code becomes more convoluted than the loop?
If you want to stay with the list-comprehension-over-everything approach, you could factor away the inner loops to helper functions. This way it is much easier to digest:
def collect_words(file):
...
file_to_words_map = {
filename: collect_words(open(filename))
for filename in list_of_files if filename.endswith(".txt")
}
Btw., I don't think breaking such statements in multiple lines necessarily makes them clearer (instead, your urge to do so is quite telling). In the above example I intentionally rejoined the for
and if
part.
One approach is to use generators! Because you end up writing your code in a 'statement' basis instead of 'expression' basis it ends up being much more expandable.
def words_of_file(filename):
"""opens the file specified and generates all words present."""
with open(filename) as file:
for line in file:
for word in line.split():
yield word
def get_words_in_files(list_of_files):
"""generates tuples of form (filename, set of words) for all text files in the given list"""
for filename in list_of_files:
# skip non text files
if not filename.endswith(".txt"):
continue # can use continue to skip instead of nesting everything
words_of_file = set(words_of_file(filename))
# dict constructor takes (key,value) tuples.
yield (filename, words_of_file)
file_to_words_map = dict(get_words_of_files(["a.txt", "b.txt", "image.png"]))
Using generators has a number of benefits:
with
and continue
and variable assignment and debugging print
statements. All because we are in a block scope instead of an expression scope.words_of_file
just generates the words, it doesn't dictate that they must be put into a set
. Some other code may choose to iterate over the words directly or pass it to the list
constructor instead. Maybe a collections.Counter
would be useful too! The point is that the generator lets the caller decide how to use the sequence.This doesn't stop you from using comprehensions or other shortcuts either, if you want to yield all the elements of an iterator you can just yield from
that iterator so you might end up with some code like this:
def words_of_file(filename):
"""opens the file specified and generates all words present."""
with open(filename) as file:
for line in file:
# produces all words of the line
yield from line.split()
file_to_words_map = {filename:set(words_of_file(filename))
for filename in list_of_files
if filename.endswith(".txt")
}
Different people have different opinions, I know my favourite is the generator only option because I am a very large fan of generators. I'm sure some people like the one liner solution of the nested comprehension, but this last version that uses simple comprehension and helper functions is probably what most people would be most comfortable with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With