Short answer: To filter a list of lists for a condition on the inner lists, use the list comprehension statement [x for x in list if condition(x)] and replace condition(x) with your filtering condition that returns True to include inner list x , and False otherwise.
A boolean list ("blist") is a list that has no holes and contains only true and false . Boolean lists can be represented in an efficient compact form, see 22.5 for details. Boolean lists are lists and all operations for lists are therefore applicable to boolean lists.
Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .
Boolean indexing is a type of indexing that uses actual values of the data in the DataFrame. In boolean indexing, we can filter a data in four ways: Accessing a DataFrame with a boolean index. Applying a boolean mask to a dataframe. Masking data based on column value.
You're looking for itertools.compress
:
>>> from itertools import compress
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> list(compress(list_a, fil))
[1, 4]
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> %timeit list(compress(list_a, fil))
100000 loops, best of 3: 2.58 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v] #winner
100000 loops, best of 3: 1.98 us per loop
>>> list_a = [1, 2, 4, 6]*100
>>> fil = [True, False, True, False]*100
>>> %timeit list(compress(list_a, fil)) #winner
10000 loops, best of 3: 24.3 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
10000 loops, best of 3: 82 us per loop
>>> list_a = [1, 2, 4, 6]*10000
>>> fil = [True, False, True, False]*10000
>>> %timeit list(compress(list_a, fil)) #winner
1000 loops, best of 3: 1.66 ms per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
100 loops, best of 3: 7.65 ms per loop
Don't use filter
as a variable name, it is a built-in function.
Like so:
filtered_list = [i for (i, v) in zip(list_a, filter) if v]
Using zip
is the pythonic way to iterate over multiple sequences in parallel, without needing any indexing. This assumes both sequences have the same length (zip stops after the shortest runs out). Using itertools
for such a simple case is a bit overkill ...
One thing you do in your example you should really stop doing is comparing things to True, this is usually not necessary. Instead of if filter[idx]==True: ...
, you can simply write if filter[idx]: ...
.
With numpy:
In [128]: list_a = np.array([1, 2, 4, 6])
In [129]: filter = np.array([True, False, True, False])
In [130]: list_a[filter]
Out[130]: array([1, 4])
or see Alex Szatmary's answer if list_a can be a numpy array but not filter
Numpy usually gives you a big speed boost as well
In [133]: list_a = [1, 2, 4, 6]*10000
In [134]: fil = [True, False, True, False]*10000
In [135]: list_a_np = np.array(list_a)
In [136]: fil_np = np.array(fil)
In [139]: %timeit list(itertools.compress(list_a, fil))
1000 loops, best of 3: 625 us per loop
In [140]: %timeit list_a_np[fil_np]
10000 loops, best of 3: 173 us per loop
To do this using numpy, ie, if you have an array, a
, instead of list_a
:
a = np.array([1, 2, 4, 6])
my_filter = np.array([True, False, True, False], dtype=bool)
a[my_filter]
> array([1, 4])
filtered_list = [list_a[i] for i in range(len(list_a)) if filter[i]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With