If I want to filter a column of strings for those that contain a certain term I can do so like this:
df = pd.DataFrame({'col':['ab','ac','abc']})
df[df['col'].str.contains('b')]
returns:
col
0 ab
2 abc
How can I filter a column of lists for those that contain a certain item? For example, from
df = pd.DataFrame({'col':[['a','b'],['a','c'],['a','b','c']]})
how can I get all lists containing 'b'?
col
0 [a, b]
2 [a, b, c]
filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index. The items, like, and regex parameters are enforced to be mutually exclusive.
Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.
You can use apply, like this.
In [13]: df[df['col'].apply(lambda x: 'b' in x)]
Out[13]:
col
0 [a, b]
2 [a, b, c]
Although generally, storing lists in a DataFrame
is a bit awkward - you might find some different representation (columns for each element in the list, MultiIndex, etc) that is easier to work with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With