How can I achieve the equivalents of SQL's <code>IN</code> and <code>NOT IN</code>? I have a list with the required values. Here's the scenario: <pre class="prettyprint"><code>df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) countries_to_keep = ['UK', 'China'] # pseudo-code: df[df['country'] not in countries_to_keep] </code></pre> My current way of doing this is as follows: <pre class="prettyprint"><code>df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) df2 = pd.DataFrame({'country': ['UK', 'China'], 'matched': True}) # IN df.merge(df2, how='inner', on='country') # NOT IN not_in = df.merge(df2, how='left', on='country') not_in = not_in[pd.isnull(not_in['matched'])] </code></pre> But this seems like a horrible kludge. Can anyone improve on it?

You can use <code>pd.Series.isin</code>. For "IN" use: <code>something.isin(somewhere)</code> Or for "NOT IN": <code>~something.isin(somewhere)</code> As a worked example: <pre class="prettyprint"><code>import pandas as pd >>> df country 0 US 1 UK 2 Germany 3 China >>> countries_to_keep ['UK', 'China'] >>> df.country.isin(countries_to_keep) 0 False 1 True 2 False 3 True Name: country, dtype: bool >>> df[df.country.isin(countries_to_keep)] country 1 UK 3 China >>> df[~df.country.isin(countries_to_keep)] country 0 US 2 Germany </code></pre>

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) countries_to_keep = ['UK', 'China']  # pseudo-code: df[df['country'] not in countries_to_keep]

My current way of doing this is as follows:

df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) df2 = pd.DataFrame({'country': ['UK', 'China'], 'matched': True})  # IN df.merge(df2, how='inner', on='country')  # NOT IN not_in = df.merge(df2, how='left', on='country') not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

272

asked Nov 13 '13 17:11

LondonRob

1 Answers

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

import pandas as pd  >>> df   country 0        US 1        UK 2   Germany 3     China >>> countries_to_keep ['UK', 'China'] >>> df.country.isin(countries_to_keep) 0    False 1     True 2    False 3     True Name: country, dtype: bool >>> df[df.country.isin(countries_to_keep)]   country 1        UK 3     China >>> df[~df.country.isin(countries_to_keep)]   country 0        US 2   Germany

149

answered Oct 16 '22 02:10

DSM

Related questions
                            
                                Python and pip, list all versions of a package that's available?
                            
                                Changing the "tick frequency" on x or y axis in matplotlib
                            
                                How to check Django version
                            
                                How to delete the contents of a folder?
                            
                                Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
                            
                                Create list of single item repeated N times
                            
                                Does Python's time.time() return the local or UTC timestamp?
                            
                                Filter dict to contain only certain keys?
                            
                                How to calculate number of days between two given dates
                            
                                Add a new item to a dictionary in Python [duplicate]
                            
                                How to urlencode a querystring in Python?
                            
                                ImportError: Cannot import name X
                            
                                Can I force pip to reinstall the current version?
                            
                                TensorFlow not found using pip
                            
                                Split string with multiple delimiters in Python [duplicate]
                            
                                Remove specific characters from a string in Python
                            
                                How do I get indices of N maximum values in a NumPy array?
                            
                                Append integer to beginning of list in Python [duplicate]
                            
                                Unzipping files in Python
                            
                                Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

Tags:

python

pandas

dataframe

sql-function

LondonRob

People also ask

1 Answers

DSM

Recent Activity

Donate For Us