I have two DataFrames that have the same column names with some matching data and some unique data.
I want to exclude the middle and only save what is unique to both DataFrames.
How would I concat or merge or join these two dataframes to do so?
For instance in this image I do not want the middle in this image, I want both sides but not the middle:
Here's my code right now:
def query_to_df(query):
...
df_a = pd.DataFrame(data_a)
df_b = pd.DataFrame(data_b)
outer_results = pd.concat([df_a, df_b], axis=1, join='outer')
return df
Let me give you an example of what I need:
df_a =
col_a col_b col_c
a1 b1 c1
a2 b2 c2
df_b =
col_a col_b col_c
a2 b2 c2
a3 b3 c3
# they only share the 2nd row: a2 b2 c2
# so the outer result should be:
col_a col_b col_c col_a col_b col_c
a1 b1 c1 NA NA NA
NA NA NA a3 b3 c3
or I'd be just as happy with 2 dataframes
result_1 =
col_a col_b col_c
a1 b1 c1
result_2 =
col_a col_b col_c
a3 b3 c3
Lastly, you'll notice that a2 b2 c2
were excluded because all of the columns match - how do I specify that I want to join based on all the columns, not just 1? If df_a
had had a2 foo c2
I would have wanted that row to be in result_1
as well.
Use pd.DataFrame.drop_duplicates
This assumes the rows were unique in their respective dataframes.
df_a.append(df_b).drop_duplicates(keep=False)
col_a col_b col_c
0 a1 b1 c1
1 a3 b3 c3
You can even use pd.concat
with the keys
parameter to give the context in which the row came.
pd.concat([df_a, df_b], keys=['a', 'b']).drop_duplicates(keep=False)
col_a col_b col_c
a 0 a1 b1 c1
b 1 a3 b3 c3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With