I have two different Pandas data-frames that have one column in common. I have seen similar questions on Stack overflow but none that seem to end up with the columns from both dataframes so please read below before marking as duplicate. Example: dataframe 1 <pre class="prettyprint"><code>ID col1 col2 ... 1 9 5 2 8 4 3 7 3 4 6 2 </code></pre> dataframe 2 <pre class="prettyprint"><code>ID col3 col4 ... 3 11 15 4 12 16 7 13 17 </code></pre> What I want to achieve is a dataframe with columns from both dataframes but without the ID's found in dataframe2. i.e: desired result: <pre class="prettyprint"><code>ID col1 col2 col3 col4 1 9 5 - - 2 8 4 - - </code></pre> Thanks!

Looks like a simple <code>drop</code> will work for what you want: <pre class="prettyprint"><code>df1.drop(df2.index, errors='ignore', axis=0) col1 col2 ID 1 9 5 2 8 4 </code></pre> Note that this assumes that <code>ID</code> is the index, otherwise use <code>.isin</code>: <pre class="prettyprint"><code>df1[~df1.ID.isin(df2.ID)] ID col1 col2 0 1 9 5 1 2 8 4 </code></pre>

How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df

Tags:

python

pandas

I have two different Pandas data-frames that have one column in common. I have seen similar questions on Stack overflow but none that seem to end up with the columns from both dataframes so please read below before marking as duplicate.

Example:

dataframe 1

ID  col1 col2  ...
1    9    5
2    8    4
3    7    3 
4    6    2

dataframe 2

ID  col3  col4  ...
3    11     15
4    12     16
7    13     17

What I want to achieve is a dataframe with columns from both dataframes but without the ID's found in dataframe2. i.e:

desired result:

ID  col1 col2  col3  col4
1    9    5     -     -
2    8    4     -     -

Thanks!

266

asked Jan 16 '19 14:01

user8322222

2 Answers

Looks like a simple drop will work for what you want:

df1.drop(df2.index, errors='ignore', axis=0)

     col1  col2
ID            
1      9     5
2      8     4

Note that this assumes that ID is the index, otherwise use .isin:

df1[~df1.ID.isin(df2.ID)]

    ID  col1  col2
0   1     9     5
1   2     8     4

answered Nov 03 '22 03:11

yatu

You can use a left join to get only the id's in the first data frame and not the second data frame while also keeping all the second data frames columns.

import pandas as pd

df1 = pd.DataFrame(
    data={"id": [1, 2, 3, 4], "col1": [9, 8, 7, 6], "col2": [5, 4, 3, 2]},
    columns=["id", "col1", "col2"],
)
df2 = pd.DataFrame(
    data={"id": [3, 4, 7], "col3": [11, 12, 13], "col4": [15, 16, 17]},
    columns=["id", "col3", "col4"],
)

df_1_2 = df1.merge(df2, on="id", how="left", indicator=True)

df_1_not_2 = df_1_2[df_1_2["_merge"] == "left_only"].drop(columns=["_merge"])

which returns

   id  col1  col2  col3  col4
0   1     9     5   NaN   NaN
1   2     8     4   NaN   NaN

answered Nov 03 '22 05:11

kfoley

Related questions
                            
                                Django 2.0 : Application labels aren't unique, duplicates: auth
                            
                                Location of N max values in a python list?
                            
                                Upload image to S3 python
                            
                                Removing rows from dataframe whose first letter is in lowercase
                            
                                Iterate in C++ like in python
                            
                                Postman, Python and passing images and metadata to a web service
                            
                                pyplot bar charts with individual data points
                            
                                Incorrect UTC date in MongoDB Compass
                            
                                conda update anaconda Fails | ClobberError
                            
                                Error Compiling Tensorflow From Source - No module named 'keras_applications'
                            
                                Rolling maximum with numpy
                            
                                Writing to JSON - Converting \u00a3 to £
                            
                                How to force install package in virtualenv?
                            
                                How to upgrade pandas on google colab
                            
                                String concatenation from a list of string, using a praticle in front and one at the end for each element
                            
                                Pygame failing to draw on Mac
                            
                                What is "DEDENT" in Python reference?
                            
                                Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
                            
                                Get training hyperparameters from a trained keras model
                            
                                Search multiple strings for multiple words

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With