Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List comprehension vs filter vs remove

How are the latter two better than the former solution?

primes = (1, 2, 3, 5, 7)

# Classic solution
items = list(range(10))
for prime in primes:
    items.remove(prime)
items

# List comprehension
items = list(range(10))
[item for item in items if item not in primes]

# Filter
items = list(range(10))
list(filter(lambda item: item not in primes, items))

The three examples are something I came across in a book and it says that the first solutions takes O(n*m) time (n=len(items), m=len(primes)) whereas the latter two take O(n*1) time... Resulting in 50 comparisons for the first solution (slightly better actually - 40 comparisons) and just 10 for the latter.

I do not understand this. I don't understand how could it be time or memory efficient.

Here is the paragraph in the book that explains this:

To remove or insert a single item from/into the list, Python needs to copy the entire list, which is especially heavy with larger lists. When executing this only once, it is of course not all that bad. But when executing a large number of deletions, a filter or list comprehension is a much faster solution because, if properly structured, it needs to copy the list only once. .... then the examples ... The latter two are much faster for large lists of items. This is because the operations are much faster. To compare using n=len(items) and m=len(primes), the first takes O(m*n)=5*10=50 operations, whereas the latter two take O(n*1)=10*1=10 operations.

EDIT: The book is not wrong. primes = set((1, 2, 3, 5, 7)) is the right declaration and not primes = (1, 2, 3, 5, 7)

like image 866
itachi Avatar asked Apr 10 '26 10:04

itachi


1 Answers

If the code in the book is exactly the same as you posted, then the book is flat out wrong.

The first example has time complexity O(n*m), but so do the other two.

If primes were a set (or dict), then it would be true -- existence lookup with in operator in a hashmap has time complexity O(1), but in a list or tuple has O(n)! Therefore, the total complexity of O(n*m).

Let's check this with some measurements:

t = tuple(range(10000))
l = list(t)
s = set(t)
d = {i:1 for i in l}

In [16]: %%timeit
4738 in t
   ....: 
10000 loops, best of 3: 45.5 µs per loop

In [17]: %%timeit
4738 in l
   ....: 
10000 loops, best of 3: 45.4 µs per loop

In [18]: %%timeit
4738 in s
   ....: 
10000000 loops, best of 3: 36.9 ns per loop

In [19]: %%timeit
4738 in d
   ....: 
10000000 loops, best of 3: 38 ns per loop

Notice the lookup in set is ~37ns (similar as in dict), 3 orders of magnitude faster than in list/tuple, ~45us.

like image 135
randomir Avatar answered Apr 12 '26 00:04

randomir



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!