I have a dataframe, which contains info about movies. It has a column called genre
, which contains a list of genres it belongs to. For example:
df['genre'] ## returns 0 ['comedy', 'sci-fi'] 1 ['action', 'romance', 'comedy'] 2 ['documentary'] 3 ['crime','horror'] ...
I want to know how can I query the dataframe, so it returns the movie belongs to a cerain genre?
For example, something may like df['genre'].contains('comedy')
returns 0 or 1.
I know for a list, I can do things like:
'comedy' in ['comedy', 'sci-fi']
However, in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains()
, but it didn't work for the list type.
Python is a computer programming language often used to build websites and software, automate tasks, and conduct data analysis. Python is a general-purpose language, meaning it can be used to create a variety of different programs and isn't specialized for any specific problems.
Python is widely considered among the easiest programming languages for beginners to learn. If you're interested in learning a programming language, Python is a good place to start. It's also one of the most widely used.
Python is written in C (actually the default implementation is called CPython).
Python is undoubtedly considered a top programming language at the same level as JavaScript or C++, and it's one of the most used languages by businesses and enterprises. Even though it's almost 30 years old, Python is still relevant, given its ease of use, its vibrant community, and many applications.
You can use apply
for create mask
and then boolean indexing
:
mask = df.genre.apply(lambda x: 'comedy' in x) df1 = df[mask] print (df1) genre 0 [comedy, sci-fi] 1 [action, romance, comedy]
using sets
df.genre.map(set(['comedy']).issubset) 0 True 1 True 2 False 3 False dtype: bool
df.genre[df.genre.map(set(['comedy']).issubset)] 0 [comedy, sci-fi] 1 [action, romance, comedy] dtype: object
presented in a way I like better
comedy = set(['comedy']) iscomedy = comedy.issubset df[df.genre.map(iscomedy)]
more efficient
comedy = set(['comedy']) iscomedy = comedy.issubset df[[iscomedy(l) for l in df.genre.values.tolist()]]
using str
in two passes
slow! and not perfectly accurate!
df[df.genre.str.join(' ').str.contains('comedy')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With