I have a dataframe <code>df1</code> which looks like: <pre class="prettyprint"><code> c k l 0 A 1 a 1 A 2 b 2 B 2 a 3 C 2 a 4 C 2 d </code></pre> and another called <code>df2</code> like: <pre class="prettyprint"><code> c l 0 A b 1 C a </code></pre> I would like to filter <code>df1</code> keeping only the values that ARE NOT in <code>df2</code>. Values to filter are expected to be as <code>(A,b)</code> and <code>(C,a)</code> tuples. So far I tried to apply the <code>isin</code> method: <pre class="prettyprint"><code>d = df[~(df['l'].isin(dfc['l']) & df['c'].isin(dfc['c']))] </code></pre> That seems to me too complicated, it returns: <pre class="prettyprint"><code> c k l 2 B 2 a 4 C 2 d </code></pre> but I'm expecting: <pre class="prettyprint"><code> c k l 0 A 1 a 2 B 2 a 4 C 2 d </code></pre>

You can do this efficiently using <code>isin</code> on a multiindex constructed from the desired columns: <pre class="prettyprint"><code>df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'], 'k': [1, 2, 2, 2, 2], 'l': ['a', 'b', 'a', 'a', 'd']}) df2 = pd.DataFrame({'c': ['A', 'C'], 'l': ['b', 'a']}) keys = list(df2.columns.values) i1 = df1.set_index(keys).index i2 = df2.set_index(keys).index df1[~i1.isin(i2)] </code></pre> <img src="https://i.stack.imgur.com/CJLkf.png" alt="enter image description here"> I think this improves on @IanS's similar solution because it doesn't assume any column type (i.e. it will work with numbers as well as strings). <hr> (Above answer is an edit. Following was my initial answer) Interesting! This is something I haven't come across before... I would probably solve it by merging the two arrays, then dropping rows where <code>df2</code> is defined. Here is an example, which makes use of a temporary array: <pre class="prettyprint"><code>df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'], 'k': [1, 2, 2, 2, 2], 'l': ['a', 'b', 'a', 'a', 'd']}) df2 = pd.DataFrame({'c': ['A', 'C'], 'l': ['b', 'a']}) # create a column marking df2 values df2['marker'] = 1 # join the two, keeping all of df1's indices joined = pd.merge(df1, df2, on=['c', 'l'], how='left') joined </code></pre> <img src="https://i.stack.imgur.com/TvDMi.png" alt="enter image description here"> <pre class="prettyprint"><code># extract desired columns where marker is NaN joined[pd.isnull(joined['marker'])][df1.columns] </code></pre> <img src="https://i.stack.imgur.com/CJLkf.png" alt="enter image description here"> There may be a way to do this without using the temporary array, but I can't think of one. As long as your data isn't huge the above method should be a fast and sufficient answer.

This is pretty succinct and works well: <pre class="prettyprint"><code>df1 = df1[~df1.index.isin(df2.index)] </code></pre>

pandas - filter dataframe by another dataframe by row elements

Tags:

python

pandas

dataframe

I have a dataframe df1 which looks like:

   c  k  l 0  A  1  a 1  A  2  b 2  B  2  a 3  C  2  a 4  C  2  d

and another called df2 like:

   c  l 0  A  b 1  C  a

I would like to filter df1 keeping only the values that ARE NOT in df2. Values to filter are expected to be as (A,b) and (C,a) tuples. So far I tried to apply the isin method:

d = df[~(df['l'].isin(dfc['l']) & df['c'].isin(dfc['c']))]

That seems to me too complicated, it returns:

   c  k  l 2  B  2  a 4  C  2  d

but I'm expecting:

   c  k  l 0  A  1  a 2  B  2  a 4  C  2  d

207

asked Oct 22 '15 13:10

Fabio Lamanna

2 Answers

You can do this efficiently using isin on a multiindex constructed from the desired columns:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],                     'k': [1, 2, 2, 2, 2],                     'l': ['a', 'b', 'a', 'a', 'd']}) df2 = pd.DataFrame({'c': ['A', 'C'],                     'l': ['b', 'a']}) keys = list(df2.columns.values) i1 = df1.set_index(keys).index i2 = df2.set_index(keys).index df1[~i1.isin(i2)]

enter image description here

I think this improves on @IanS's similar solution because it doesn't assume any column type (i.e. it will work with numbers as well as strings).

(Above answer is an edit. Following was my initial answer)

Interesting! This is something I haven't come across before... I would probably solve it by merging the two arrays, then dropping rows where df2 is defined. Here is an example, which makes use of a temporary array:

df1 = pd.DataFrame({'c': ['A', 'A', 'B', 'C', 'C'],                     'k': [1, 2, 2, 2, 2],                     'l': ['a', 'b', 'a', 'a', 'd']}) df2 = pd.DataFrame({'c': ['A', 'C'],                     'l': ['b', 'a']})  # create a column marking df2 values df2['marker'] = 1  # join the two, keeping all of df1's indices joined = pd.merge(df1, df2, on=['c', 'l'], how='left') joined

enter image description here

# extract desired columns where marker is NaN joined[pd.isnull(joined['marker'])][df1.columns]

enter image description here

There may be a way to do this without using the temporary array, but I can't think of one. As long as your data isn't huge the above method should be a fast and sufficient answer.

191

answered Sep 26 '22 05:09

jakevdp

This is pretty succinct and works well:

df1 = df1[~df1.index.isin(df2.index)]

answered Sep 26 '22 05:09

Haroon Hassan

Related questions
                            
                                When should I use a custom Manager versus a custom QuerySet in Django?
                            
                                Proper way to assert type of variable in Python
                            
                                Is there a way to convert indentation in Python code to braces?
                            
                                How big should batch size and number of epochs be when fitting a model in Keras?
                            
                                Preserving global state in a flask application [duplicate]
                            
                                Python Pandas: Convert Rows as Column headers [duplicate]
                            
                                How to pass a default argument value of an instance member to a method?
                            
                                Grep and Python
                            
                                How to document fields and properties in Python?
                            
                                Is there a way to check if a subprocess is still running?
                            
                                What is the difference between using squared brackets or dot to access a column?
                            
                                How to debug a Python segmentation fault?
                            
                                Converting list to numpy array
                            
                                sending NaN in json
                            
                                How can I concatenate str and int objects?
                            
                                Adding a new pandas column with mapped value from a dictionary [duplicate]
                            
                                pandas .at versus .loc
                            
                                How do I access the child classes of an object in django without knowing the name of the child class?
                            
                                Pandas to_html() truncates string contents
                            
                                Splat operators in JavaScript, equivalent to *args and **kwargs in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas - filter dataframe by another dataframe by row elements

Tags:

python

pandas

dataframe

Fabio Lamanna

People also ask

2 Answers

jakevdp

Haroon Hassan

Recent Activity

Donate For Us