Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter a DataFrame column of lists for those that contain a certain item

If I want to filter a column of strings for those that contain a certain term I can do so like this:

df = pd.DataFrame({'col':['ab','ac','abc']})
df[df['col'].str.contains('b')]

returns:

   col
0   ab
2  abc

How can I filter a column of lists for those that contain a certain item? For example, from

df = pd.DataFrame({'col':[['a','b'],['a','c'],['a','b','c']]})

how can I get all lists containing 'b'?

         col
0     [a, b]
2  [a, b, c]
like image 336
rurp Avatar asked Aug 28 '15 22:08

rurp


People also ask

How do you filter Dataframe on a column?

filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index. The items, like, and regex parameters are enforced to be mutually exclusive.

How do I select only certain columns in a Dataframe?

Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.


1 Answers

You can use apply, like this.

In [13]: df[df['col'].apply(lambda x: 'b' in x)]
Out[13]: 
         col
0     [a, b]
2  [a, b, c]

Although generally, storing lists in a DataFrame is a bit awkward - you might find some different representation (columns for each element in the list, MultiIndex, etc) that is easier to work with.

like image 174
chrisb Avatar answered Oct 21 '22 08:10

chrisb