I just read a brilliant reply from Sloth at Remove lines that contain certain string question whilst searching for a way to filter out garbage lines in a txt / csv file. The gist is "take x y z words/strings/whatever from input file, then filter through each line writing only the unfiltered lines."
The code he posted was:
bad_words = ['bad', 'naughty']
with open('oldfile.txt') as oldfile, open('newfile.txt', 'w') as newfile:
for line in oldfile:
if not any(bad_word in line for bad_word in bad_words):
newfile.write(line)
My question is: Would someone explain the line if not any(bad_word in line for bad_word in bad_words): ?
I tried just putting in if not any(bad_word in line): but it gave me an error.
I am trying to understand why. A cursory search at python docs webpage didn't help me (I'm new to Python/programming and might not be too bright to boot :-) ).
Any references for me to read is appreciated.
Thanks!
Would someone explain the line
if not any(bad_word in line for bad_word in bad_words)
Sure.
bad_word in line for bad_word in bad_words is what's called a generator expression. It is very similar to a list comprehension, but more memory efficient.
if not any(bad_word in line for bad_word in bad_words):
newfile.write(line)
is basically equivalent to:
list1 = []
for bad_word in bad_words:
if bad_word in line:
list1.append(True)
else:
list1.append(False)
if not any(list1):
newfile.write(line)
I tried just putting in
if not any(bad_word in line):but it gave me an error
Yeah, because any takes an iterable as input, and you have provided a boolean (bad_word in line evaluates to True or False, you can't iterate over it).
Try providing something you can iterate over, such as a list: if not any([True, False, True]):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With