Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas cross join no columns in common [duplicate]

Tags:

python

pandas

How would you perform a full outer join a cross join of two dataframes with no columns in common using pandas?

In MySQL, you can simply do:

SELECT *
FROM table_1
[CROSS] JOIN table_2;

But in pandas, doing:

df_1.merge(df_2, how='outer')

gives an error:

MergeError: No common columns to perform merge on

The best solution I have so far is using sqlite:

import sqlalchemy as sa
engine = sa.create_engine('sqlite:///tmp.db')
df_1.to_sql('df_1', engine)
df_2.to_sql('df_2', engine)
df = pd.read_sql_query('SELECT * FROM df_1 JOIN df_2', engine)
like image 543
ostrokach Avatar asked Feb 08 '16 08:02

ostrokach


People also ask

How do I get rid of duplicate columns while merging Pandas?

In this example, we are using the pd. merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'.

When using the merge () function on two Dataframes Which of the following joins is likely to preserve the most keys in the result?

Inner joins The most common type of join is called an inner join. An inner join combines two DataFrames based on a join key and returns a new DataFrame that contains only those rows that have matching values in both of the original DataFrames.

What is cross join in Pandas?

A Cross Join is a type of join that allows you to produce a Cartesian Product of rows in two or more tables. In other words, it combines rows from a first table with each row from a second table. The following table illustrates the result of Cross Join , when joining the Table df1 to another Table df2.


1 Answers

IIUC you need merge with temporary columns tmp of both DataFrames:

import pandas as pd

df1 = pd.DataFrame({'fld1': ['x', 'y'],
                'fld2': ['a', 'b1']})


df2 = pd.DataFrame({'fld3': ['y', 'x', 'y'],
                'fld4': ['a', 'b1', 'c2']})

print df1
  fld1 fld2
0    x    a
1    y   b1

print df2
  fld3 fld4
0    y    a
1    x   b1
2    y   c2

df1['tmp'] = 1
df2['tmp'] = 1

df = pd.merge(df1, df2, on=['tmp'])
df = df.drop('tmp', axis=1)
print df
  fld1 fld2 fld3 fld4
0    x    a    y    a
1    x    a    x   b1
2    x    a    y   c2
3    y   b1    y    a
4    y   b1    x   b1
5    y   b1    y   c2
like image 154
jezrael Avatar answered Sep 20 '22 05:09

jezrael