I have two dataframes: <pre class="prettyprint"><code> df1 = row1;row2;row3 df2 = row4;row5;row6;row2 </code></pre> I want my output dataframe to only contain the rows unique in df1, i.e.: <pre class="prettyprint"><code>df_out = row1;row3 </code></pre> How do I get this most efficiently? This code does what I want, but using 2 for-loops: <pre class="prettyprint"><code>a = pd.DataFrame({0:[1,2,3],1:[10,20,30]}) b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]}) match_ident = [] for i in range(0,len(a)): found=False for j in range(0,len(b)): if a[0][i]==b[0][j]: if a[1][i]==b[1][j]: found=True match_ident.append(not(found)) a = a[match_ident] </code></pre>

You an use <code>merge</code> with parameter <code>indicator</code> and outer join, <code>query</code> for filtering and then remove helper column with <code>drop</code>: DataFrames are joined on all columns, so <code>on</code> parameter can be omit. <pre class="prettyprint"><code>print (pd.merge(a,b, indicator=True, how='outer') .query('_merge=="left_only"') .drop('_merge', axis=1)) 0 1 0 1 10 2 3 30 </code></pre>

How to remove rows in a Pandas dataframe if the same row exists in another dataframe?

Tags:

python

pandas

I have two dataframes:

 df1 = row1;row2;row3  df2 = row4;row5;row6;row2

I want my output dataframe to only contain the rows unique in df1, i.e.:

df_out = row1;row3

How do I get this most efficiently?

This code does what I want, but using 2 for-loops:

a = pd.DataFrame({0:[1,2,3],1:[10,20,30]}) b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]})  match_ident = [] for i in range(0,len(a)):     found=False     for j in range(0,len(b)):         if a[0][i]==b[0][j]:             if a[1][i]==b[1][j]:                 found=True     match_ident.append(not(found))  a = a[match_ident]

800

asked Jun 22 '17 18:06

RRC

Video Answer

1 Answers

You an use merge with parameter indicator and outer join, query for filtering and then remove helper column with drop:

DataFrames are joined on all columns, so on parameter can be omit.

print (pd.merge(a,b, indicator=True, how='outer')          .query('_merge=="left_only"')          .drop('_merge', axis=1))    0   1 0  1  10 2  3  30

162

answered Sep 19 '22 00:09

jezrael

Related questions
                            
                                PyCharm Running Out of Memory
                            
                                Unzipping directory structure with python
                            
                                Best way to define multidimensional dictionaries in python? [duplicate]
                            
                                In python how to get name of a class inside its static method
                            
                                python: iterate a specific range in a list
                            
                                Pip Install -r continue past installs that fail
                            
                                Python dictionary in to html table
                            
                                Mocking __init__() for unittesting
                            
                                Scikit-learn is returning coefficient of determination (R^2) values less than -1
                            
                                How does the pyspark mapPartitions function work?
                            
                                How to repeat individual characters in strings in Python
                            
                                How to use AirFlow to run a folder of python files?
                            
                                Dependency version syntax for Python Poetry
                            
                                custom tagging with nltk
                            
                                Python and BeautifulSoup encoding issues [duplicate]
                            
                                Python how to read output from pexpect child?
                            
                                Install python wheel file without using pip
                            
                                How do I subtract the previous row from the current row in a pandas dataframe and apply it to every row; without using a loop?
                            
                                DLL load failed when importing PyQt5
                            
                                AssertionError: Egg-link .. does not match installed location of ReviewBoard (at /...)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With