I need to filter a large lists several times, but I'm concerned with both simplicity of code and execution efficiency. To give an example:
all_things # huge collection of all things
# inefficient but clean code
def get_clothes():
return filter(lambda t: t.garment, allThings)
def get_hats():
return filter(lambda t: t.headgear, get_clothes())
I'm concerned that I'm iterating over the clothes list when in fact it has already been iterated over. I also want to keep the two filter operations separate, as they belong to two different classes, and I do not want to duplicate the first lambda function in the hats class.
# efficient but duplication of code
def get_clothes():
return filter(lambda t: t.garment, allThings)
def get_hats():
return filter(lambda t: t.headgear and t.garment, allThings)
I have been investigating generator functions, as they seemed like the way to go, but I haven't as yet figure out how.
First of all using filter
/lambda
combination is going to be deprecated. Current functional programming style is described in Python Functional Programming HOWTO.
Secondly, if you concerned with efficiency, rather than construct lists, you should return generators. In this case they are simple enough to use generator expressions.
def get_clothes():
return (t for t in allThings if t.garment)
def get_hats():
return (t for t in get_clothes() if t.headgear)
Or if you'd prefer, true generators (allegedly more pythonic):
def get_clothes():
for t in allThings:
if t.garment:
yield t
def get_hats():
for t in get_clothes():
if t.headgear:
yield t
If for some reason, sometimes you need list
rather than iterator
, you can construct list by simple casting:
hats_list = list(get_hats())
Note, that above will not construct list of clothes, thus efficiency is close to your duplicate code version.
I was looking for similar filtering of lists but wanted to have a slightly different format to what was presented here.
The get_hats()
call above is good but limited in its reuse. I was looking for something more like get_hats(get_clothes(all_things))
, where you can specify a source (all_things)
, and then as few or as many levels of filters get_hats()
, get_clothes()
as you want.
I found a way to do that with generators:
def get_clothes(in_list):
for item in in_list:
if item.garment:
yield item
def get_hats(in_list):
for item in in_list:
if item.headgear:
yield item
This can then be called by:
get_hats(get_clothes(all_things))
I tested the original solutions, vartec's solution and this additional solution to see the efficiency, and was somewhat surprised by the results. Code as follows:
Setup:
class Thing:
def __init__(self):
self.garment = False
self.headgear = False
all_things = [Thing() for i in range(1000000)]
for i, thing in enumerate(all_things):
if i % 2 == 0:
thing.garment = True
if i % 4 == 0:
thing.headgear = True
Original solutions:
def get_clothes():
return filter(lambda t: t.garment, all_things)
def get_hats():
return filter(lambda t: t.headgear, get_clothes())
def get_clothes2():
return filter(lambda t: t.garment, all_things)
def get_hats2():
return filter(lambda t: t.headgear and t.garment, all_things)
My solution:
def get_clothes3(in_list):
for item in in_list:
if item.garment:
yield item
def get_hats3(in_list):
for item in in_list:
if item.headgear:
yield item
vartec's solution:
def get_clothes4():
for t in all_things:
if t.garment:
yield t
def get_hats4():
for t in get_clothes4():
if t.headgear:
yield t
Timing code:
import timeit
print 'get_hats()'
print timeit.timeit('get_hats()', 'from __main__ import get_hats', number=1000)
print 'get_hats2()'
print timeit.timeit('get_hats2()', 'from __main__ import get_hats2', number=1000)
print '[x for x in get_hats3(get_clothes3(all_things))]'
print timeit.timeit('[x for x in get_hats3(get_clothes3(all_things))]',
'from __main__ import get_hats3, get_clothes3, all_things',
number=1000)
print '[x for x in get_hats4()]'
print timeit.timeit('[x for x in get_hats4()]',
'from __main__ import get_hats4', number=1000)
Results:
get_hats()
379.334653854
get_hats2()
232.768362999
[x for x in get_hats3(get_clothes3(all_things))]
214.376812935
[x for x in get_hats4()]
218.250688076
The generator expressions appear to be slightly faster, the difference in time between my and vartec's solutions are probably just noise. But I prefer the flexibility of being able to apply whatever filters are required in whatever order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With