I have two different Pandas data-frames that have one column in common. I have seen similar questions on Stack overflow but none that seem to end up with the columns from both dataframes so please read below before marking as duplicate.
Example:
dataframe 1
ID col1 col2 ...
1 9 5
2 8 4
3 7 3
4 6 2
dataframe 2
ID col3 col4 ...
3 11 15
4 12 16
7 13 17
What I want to achieve is a dataframe with columns from both dataframes but without the ID's found in dataframe2. i.e:
desired result:
ID col1 col2 col3 col4
1 9 5 - -
2 8 4 - -
Thanks!
To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.
The Pandas dataframe drop() method takes single or list label names and delete corresponding rows and columns. The axis = 0 is for rows and axis =1 is for columns. In this example, we are deleting the row that 'mark' column has value =100 so three rows are satisfying the condition.
Looks like a simple drop
will work for what you want:
df1.drop(df2.index, errors='ignore', axis=0)
col1 col2
ID
1 9 5
2 8 4
Note that this assumes that ID
is the index, otherwise use .isin
:
df1[~df1.ID.isin(df2.ID)]
ID col1 col2
0 1 9 5
1 2 8 4
You can use a left join to get only the id
's in the first data frame and not the second data frame while also keeping all the second data frames columns.
import pandas as pd
df1 = pd.DataFrame(
data={"id": [1, 2, 3, 4], "col1": [9, 8, 7, 6], "col2": [5, 4, 3, 2]},
columns=["id", "col1", "col2"],
)
df2 = pd.DataFrame(
data={"id": [3, 4, 7], "col3": [11, 12, 13], "col4": [15, 16, 17]},
columns=["id", "col3", "col4"],
)
df_1_2 = df1.merge(df2, on="id", how="left", indicator=True)
df_1_not_2 = df_1_2[df_1_2["_merge"] == "left_only"].drop(columns=["_merge"])
which returns
id col1 col2 col3 col4
0 1 9 5 NaN NaN
1 2 8 4 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With