Given the following list:
DNA_list = ['ATAT', 'GTGTACGT', 'AAAAGGTT']
I want to filter strings longer than 3 characters. I achieve this with the following code:
With for loop:
long_dna = []
for element in DNA_list:
length = len(element)
if int(length) > 3:
long_dna.append(element)
print long_dna
But I want my code to be more general, so I can later filter strings of any length, so I use a function and for loop:
def get_long(dna_seq, threshold):
return len(dna_seq) > threshold
long_dna_loop2 = []
for element in DNA_list:
if get_long(element, 3) is True:
long_dna_loop2.append(element)
print long_dna_loop2
I want to achieve the same generality using filter()
but I cannot achieve this. If I use the above function get_long()
, I simply cannot pass arguments to it when I use it with filter()
. Is it just not possible or is there a way around it?
My code with filter()
for the specific case:
def is_long(dna):
return len(dna) > 3
long_dna_filter = filter(is_long, DNA_list)
Use lambda
to provide the threshold, like this:
filter(lambda seq: get_long(seq, 3),
dna_list)
Do you need to use filter()
? Why not use a more Pythonic list comprehension?
Example:
>>> DNA_list = ['ATAT', 'GTGTACGT', 'AAAAGGTT']
>>> threshold = 3
>>> long_dna = [dna_seq for dna_seq in DNA_list if len(dna_seq) > threshold]
>>> long_dna
['ATAT', 'GTGTACGT', 'AAAAGGTT']
>>> threshold = 4
>>> [dna_seq for dna_seq in DNA_list if len(dna_seq) > threshold]
['GTGTACGT', 'AAAAGGTT']
This method has the advantage that it's trivial to convert it to a generator which can provide improved memory and execution depending on your application, e.g. if you have a lot of DNA sequences, and you want to iterate over them, realising them as a list will consume a lot of memory in one go. The equivalent generator simply requires replacing square brackets []
with round brackets ()
:
>>> long_dna = (dna_seq for dna_seq in DNA_list if len(dna_seq) > threshold)
<generator object <genexpr> at 0x7f50de229cd0>
>>> list(long_dna)
['GTGTACGT', 'AAAAGGTT']
In Python 2 this performance improvement is not an option with filter()
because it returns a list. In Python 3 filter()
returns a filter object more akin to a generator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With