How would you perform a full outer join a cross join of two dataframes with no columns in common using pandas?
In MySQL, you can simply do:
SELECT *
FROM table_1
[CROSS] JOIN table_2;
But in pandas, doing:
df_1.merge(df_2, how='outer')
gives an error:
MergeError: No common columns to perform merge on
The best solution I have so far is using sqlite
:
import sqlalchemy as sa engine = sa.create_engine('sqlite:///tmp.db') df_1.to_sql('df_1', engine) df_2.to_sql('df_2', engine) df = pd.read_sql_query('SELECT * FROM df_1 JOIN df_2', engine)
In this example, we are using the pd. merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'.
Inner joins The most common type of join is called an inner join. An inner join combines two DataFrames based on a join key and returns a new DataFrame that contains only those rows that have matching values in both of the original DataFrames.
A Cross Join is a type of join that allows you to produce a Cartesian Product of rows in two or more tables. In other words, it combines rows from a first table with each row from a second table. The following table illustrates the result of Cross Join , when joining the Table df1 to another Table df2.
IIUC you need merge
with temporary columns tmp
of both DataFrames
:
import pandas as pd
df1 = pd.DataFrame({'fld1': ['x', 'y'],
'fld2': ['a', 'b1']})
df2 = pd.DataFrame({'fld3': ['y', 'x', 'y'],
'fld4': ['a', 'b1', 'c2']})
print df1
fld1 fld2
0 x a
1 y b1
print df2
fld3 fld4
0 y a
1 x b1
2 y c2
df1['tmp'] = 1
df2['tmp'] = 1
df = pd.merge(df1, df2, on=['tmp'])
df = df.drop('tmp', axis=1)
print df
fld1 fld2 fld3 fld4
0 x a y a
1 x a x b1
2 x a y c2
3 y b1 y a
4 y b1 x b1
5 y b1 y c2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With