I have a dataframe, which contains info about movies. It has a column called <code>genre</code>, which contains a list of genres it belongs to. For example: <pre class="prettyprint"><code>df['genre'] ## returns 0 ['comedy', 'sci-fi'] 1 ['action', 'romance', 'comedy'] 2 ['documentary'] 3 ['crime','horror'] ... </code></pre> I want to know how can I query the dataframe, so it returns the movie belongs to a cerain genre? For example, something may like <code>df['genre'].contains('comedy')</code> returns 0 or 1. I know for a list, I can do things like: <pre class="prettyprint"><code>'comedy' in ['comedy', 'sci-fi'] </code></pre> However, in pandas, I didn't find something similar, the only thing I know is <code>df['genre'].str.contains()</code>, but it didn't work for the list type.

You can use <code>apply</code> for create <code>mask</code> and then <code>boolean indexing</code>: <pre class="prettyprint"><code>mask = df.genre.apply(lambda x: 'comedy' in x) df1 = df[mask] print (df1) genre 0 [comedy, sci-fi] 1 [action, romance, comedy] </code></pre>

Python & Pandas: How to query if a list-type column contains something?

Tags:

python

pandas

I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example:

df['genre']  ## returns   0       ['comedy', 'sci-fi'] 1       ['action', 'romance', 'comedy'] 2       ['documentary'] 3       ['crime','horror'] ...

I want to know how can I query the dataframe, so it returns the movie belongs to a cerain genre?

For example, something may like df['genre'].contains('comedy') returns 0 or 1.

I know for a list, I can do things like:

'comedy' in  ['comedy', 'sci-fi']

However, in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains(), but it didn't work for the list type.

857

asked Jan 07 '17 07:01

cqcn1991

2 Answers

You can use apply for create mask and then boolean indexing:

mask = df.genre.apply(lambda x: 'comedy' in x) df1 = df[mask] print (df1)                        genre 0           [comedy, sci-fi] 1  [action, romance, comedy]

181

answered Oct 01 '22 06:10

jezrael

using sets

df.genre.map(set(['comedy']).issubset)  0     True 1     True 2    False 3    False dtype: bool

df.genre[df.genre.map(set(['comedy']).issubset)]  0             [comedy, sci-fi] 1    [action, romance, comedy] dtype: object

presented in a way I like better

comedy = set(['comedy']) iscomedy = comedy.issubset df[df.genre.map(iscomedy)]

more efficient

comedy = set(['comedy']) iscomedy = comedy.issubset df[[iscomedy(l) for l in df.genre.values.tolist()]]

using str in two passes
slow! and not perfectly accurate!

df[df.genre.str.join(' ').str.contains('comedy')]

answered Oct 01 '22 06:10

piRSquared

Related questions
                            
                                Python PIL has no attribute 'Image'
                            
                                how to plot and annotate hierarchical clustering dendrograms in scipy/matplotlib
                            
                                Django ChoiceField
                            
                                Pylint to show only warnings and errors
                            
                                How to find table like structure in image
                            
                                Does Python have anything Like Capybara/Cucumber?
                            
                                Dynamically limiting queryset of related field
                            
                                'module' object is not callable - calling method in another file
                            
                                Python scikit-learn: exporting trained classifier
                            
                                numpy.r_ is not a function. What is it?
                            
                                Pre-populate an inline FormSet?
                            
                                How to build a single python file from multiple scripts?
                            
                                GridSearch for an estimator inside a OneVsRestClassifier
                            
                                Catch "socket.error: [Errno 111] Connection refused" exception
                            
                                How would I access variables from one class to another?
                            
                                Django equivalent of PHP's form value array/associative array
                            
                                Parentheses in Python Conditionals
                            
                                Merging a Python script's subprocess' stdout and stderr while keeping them distinguishable
                            
                                OpenCV Python: Draw minAreaRect ( RotatedRect not implemented)
                            
                                How to delete an instantiated object Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With