Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing a function with two arguments to filter() in python

Given the following list:

DNA_list = ['ATAT', 'GTGTACGT', 'AAAAGGTT']

I want to filter strings longer than 3 characters. I achieve this with the following code:

With for loop:

long_dna = []
for element in DNA_list:
    length = len(element)
    if int(length) > 3:
        long_dna.append(element)
print long_dna

But I want my code to be more general, so I can later filter strings of any length, so I use a function and for loop:

def get_long(dna_seq, threshold):
    return len(dna_seq) > threshold

long_dna_loop2 = []
for element in DNA_list:
    if get_long(element, 3) is True:
        long_dna_loop2.append(element)
print long_dna_loop2

I want to achieve the same generality using filter() but I cannot achieve this. If I use the above function get_long(), I simply cannot pass arguments to it when I use it with filter(). Is it just not possible or is there a way around it?

My code with filter() for the specific case:

def is_long(dna):
        return len(dna) > 3

    long_dna_filter = filter(is_long, DNA_list)
like image 956
Homap Avatar asked Jan 05 '16 10:01

Homap


2 Answers

Use lambda to provide the threshold, like this:

filter(lambda seq: get_long(seq, 3),
       dna_list)
like image 194
Andrea Corbellini Avatar answered Oct 05 '22 14:10

Andrea Corbellini


Do you need to use filter()? Why not use a more Pythonic list comprehension?

Example:

>>> DNA_list = ['ATAT', 'GTGTACGT', 'AAAAGGTT']
>>> threshold = 3
>>> long_dna = [dna_seq for dna_seq in DNA_list if len(dna_seq) > threshold]
>>> long_dna
['ATAT', 'GTGTACGT', 'AAAAGGTT']

>>> threshold = 4
>>> [dna_seq for dna_seq in DNA_list if len(dna_seq) > threshold]
['GTGTACGT', 'AAAAGGTT']

This method has the advantage that it's trivial to convert it to a generator which can provide improved memory and execution depending on your application, e.g. if you have a lot of DNA sequences, and you want to iterate over them, realising them as a list will consume a lot of memory in one go. The equivalent generator simply requires replacing square brackets [] with round brackets ():

>>> long_dna = (dna_seq for dna_seq in DNA_list if len(dna_seq) > threshold)
<generator object <genexpr> at 0x7f50de229cd0>
>>> list(long_dna)
['GTGTACGT', 'AAAAGGTT']

In Python 2 this performance improvement is not an option with filter() because it returns a list. In Python 3 filter() returns a filter object more akin to a generator.

like image 40
mhawke Avatar answered Oct 05 '22 14:10

mhawke