Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity analysis of filter() with a lambda function

Given two lists, list1 and list2

list3 = filter(lambda x: x in list1,list2)

This returns the intersection of the two lists.

How can I find the complexity of this algorithm? I have found out that the time complexity of x in list1 is O(n) where n is number of elements in the list, but how about the filter?

like image 945
Tomarinator Avatar asked Jan 09 '23 14:01

Tomarinator


1 Answers

Your code does O(len(list1) * len(list2)) comparison operations for elements.

  • Your lambda function is executed O(len(list2)) times, once per each element filtered. See documentation for filter in Python 3 (Python 2):

    filter(function, iterable)

    Construct an iterator from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator

    (emphasis mine)

    Clearly the function is called at least 1 time for each (distinct) element in iterable - knowing when not to need call it would mean also solving the Halting problem in general case, which not even the Python core developers are yet to solve ;-). In practice in CPython 3 the filter builtin creates an iterator which when advanced, executes the function once for each element (distinct or not) in the iteration order.

  • The x in list1 does O(len(list1)) comparisons in average and worst case, as documented.


To speed it up, use a set; also you do not need a lambda function at all (use __contains__ magic method)

list3 = filter(set(list1).__contains__, list2)

This builds a set of the list1 once in O(len(list1)) time and runs the filter against it with O(len(list2)) average complexity for O(len(list1) + len(list2))


If the ordering of elements of list2 does not matter then you can also do

set(list1).intersection(list2)

which should have lower constants than doing the filter above; and for really fast code, you should order the lists so that the smaller is turned into a set (since both intersection and set building have documented average complexity of O(n), but set building most probably will have larger constants due to resizing the set, thus it will make sense to build the set from the smaller to decrease the weight of these constants):

smaller, larger = sorted([list1, list2], key=len)
result = set(smaller).intersection(larger)

Notice that Python 2 and 3 differ from each other. filter in Python 3 returns a generator, and the actual running time depends on the number of elements consumed from the resulting generator, whereas in Python 2 a list will be generated upfront, which might be more costly if you need only the first values.

like image 134