Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python & Pandas: How to query if a list-type column contains something?

Tags:

python

pandas

I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example:

df['genre']  ## returns   0       ['comedy', 'sci-fi'] 1       ['action', 'romance', 'comedy'] 2       ['documentary'] 3       ['crime','horror'] ... 

I want to know how can I query the dataframe, so it returns the movie belongs to a cerain genre?

For example, something may like df['genre'].contains('comedy') returns 0 or 1.

I know for a list, I can do things like:

'comedy' in  ['comedy', 'sci-fi'] 

However, in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains(), but it didn't work for the list type.

like image 857
cqcn1991 Avatar asked Jan 07 '17 07:01

cqcn1991


People also ask

What is Python used for?

Python is a computer programming language often used to build websites and software, automate tasks, and conduct data analysis. Python is a general-purpose language, meaning it can be used to create a variety of different programs and isn't specialized for any specific problems.

Is Python easy to learn?

Python is widely considered among the easiest programming languages for beginners to learn. If you're interested in learning a programming language, Python is a good place to start. It's also one of the most widely used.

Is Python written in C?

Python is written in C (actually the default implementation is called CPython).

Is Python coding good?

Python is undoubtedly considered a top programming language at the same level as JavaScript or C++, and it's one of the most used languages by businesses and enterprises. Even though it's almost 30 years old, Python is still relevant, given its ease of use, its vibrant community, and many applications.


2 Answers

You can use apply for create mask and then boolean indexing:

mask = df.genre.apply(lambda x: 'comedy' in x) df1 = df[mask] print (df1)                        genre 0           [comedy, sci-fi] 1  [action, romance, comedy] 
like image 181
jezrael Avatar answered Oct 01 '22 06:10

jezrael


using sets

df.genre.map(set(['comedy']).issubset)  0     True 1     True 2    False 3    False dtype: bool 

df.genre[df.genre.map(set(['comedy']).issubset)]  0             [comedy, sci-fi] 1    [action, romance, comedy] dtype: object 

presented in a way I like better

comedy = set(['comedy']) iscomedy = comedy.issubset df[df.genre.map(iscomedy)] 

more efficient

comedy = set(['comedy']) iscomedy = comedy.issubset df[[iscomedy(l) for l in df.genre.values.tolist()]] 

using str in two passes
slow! and not perfectly accurate!

df[df.genre.str.join(' ').str.contains('comedy')] 
like image 39
piRSquared Avatar answered Oct 01 '22 06:10

piRSquared