Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

only outer join python pandas

I have two DataFrames that have the same column names with some matching data and some unique data.

I want to exclude the middle and only save what is unique to both DataFrames.

How would I concat or merge or join these two dataframes to do so?

For instance in this image I do not want the middle in this image, I want both sides but not the middle:

enter image description here

Here's my code right now:

def query_to_df(query):
    ...
    df_a = pd.DataFrame(data_a)
    df_b = pd.DataFrame(data_b)
    outer_results = pd.concat([df_a, df_b], axis=1, join='outer')
    return df

Let me give you an example of what I need:

df_a = 
col_a  col_b  col_c
   a1     b1     c1
   a2     b2     c2

df_b = 
col_a  col_b  col_c
   a2     b2     c2
   a3     b3     c3

# they only share the 2nd row:    a2     b2     c2 
# so the outer result should be:
col_a  col_b  col_c  col_a  col_b  col_c
   a1     b1     c1     NA     NA     NA
   NA     NA     NA     a3     b3     c3

or I'd be just as happy with 2 dataframes

result_1 =
col_a  col_b  col_c
   a1     b1     c1

result_2 =
col_a  col_b  col_c
   a3     b3     c3

Lastly, you'll notice that a2 b2 c2 were excluded because all of the columns match - how do I specify that I want to join based on all the columns, not just 1? If df_a had had a2 foo c2 I would have wanted that row to be in result_1 as well.

like image 223
Legit Stack Avatar asked Dec 14 '22 19:12

Legit Stack


1 Answers

Use pd.DataFrame.drop_duplicates
This assumes the rows were unique in their respective dataframes.

df_a.append(df_b).drop_duplicates(keep=False)

  col_a col_b col_c
0    a1    b1    c1
1    a3    b3    c3

You can even use pd.concat with the keys parameter to give the context in which the row came.

pd.concat([df_a, df_b], keys=['a', 'b']).drop_duplicates(keep=False)

    col_a col_b col_c
a 0    a1    b1    c1
b 1    a3    b3    c3
like image 123
piRSquared Avatar answered Dec 26 '22 10:12

piRSquared