Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do pandas equivalence of SQL outer join without a key

In SQL, you can join two tables without a key so that all records of both tables merge with each other. If pandas.concat() or pandas.merge() or some other pandas syntax supported this, it could help me with one step of a problem I am trying to solve. I found an outer join option on the help documentation, but I could not find an exact syntax to do what I wanted (join all records without a key).

To explain this a little better:

import pandas as pd

lunchmenupairs2 = [["pizza", "italian"],["lasagna", "italian"],["orange", "fruit"]]
teamcuisinepreferences2 = [["ian", "*"]]

lunchLabels = ["Food", "Type"]
teamLabels = ["Person", "Type"]

df1 = pd.DataFrame.from_records(lunchmenupairs2, columns=lunchLabels)
df2 = pd.DataFrame.from_records(teamcuisinepreferences2, columns=teamLabels)

print(df1)
print(df2)

Outputs these tables:

      Food     Type
0    pizza  italian
1  lasagna  italian
2   orange    fruit

  Person     Type
0    ian        *

I want the final result of the merge to be:

  Person     Type Food     Type
0  ian        *   pizza     italian
1  ian        *   lasagna   italian
2  ian        *   orange    fruit

Then I can easily drop the columns I don't want and move to the next step in the code I am working on. This doesn't work:

merged_data = pd.merge(left=df2,right=df1, how='outer')

Is there a way to do this type of DataFrame merging?

like image 567
TMWP Avatar asked May 26 '17 12:05

TMWP


People also ask

How do you use outer join in pandas?

3. Outer Join Using merge() Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join, to do an outer join use how param with outer value.

Which of the following merge methods of pandas can be used as the SQL equivalent?

In pandas, SQL's GROUP BY operations are performed using the similarly named groupby() method. groupby() typically refers to a process where we'd like to split a dataset into groups, apply some function (typically aggregation) , and then combine the groups together.

How do I join a panda in python?

Join DataFrames using their indexes. If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index. Another option to join using the key columns is to use the on parameter.


2 Answers

You can add a column to both dfs with a constant value,

>>>df1['joincol'] = 1
>>>df2['joincol'] = 1
>>>pd.merge(left=df2,right=df1, on='joincol', how='outer')
  Person Type_x  joincol     Food   Type_y
0    ian      *        1    pizza  italian
1    ian      *        1  lasagna  italian
2    ian      *        1   orange    fruit

then delete it afterward when you remove your other undesired columns.

like image 143
EFT Avatar answered Oct 20 '22 08:10

EFT


This is possible with cross-join, introduced in Pandas 1.2.0. Simply run:

df1.merge(df2, how='cross')
like image 34
Ran Feldesh Avatar answered Oct 20 '22 08:10

Ran Feldesh