Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe Cartesian join

I have two pandas dataframes, and I would like to combine each second dataframe row with each first dataframe row like this:

First:

val1 val2
1    2
0    0
2    1

Second:

l1 l2
a    a
b    c

Result (expected result size = len(first) * len(second)):

val1 val2 l1 l2
1    2    a    a
1    2    b    c
0    0    a    a
0    0    b    c
2    1    a    a
2    1    b    b

They have no same index.

Regards, Secau

like image 205
secauvr Avatar asked Sep 07 '15 14:09

secauvr


People also ask

How do you get cartesian product in pandas?

Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.

How do I cross join two pandas DataFrames?

In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter. # merge on that key.

What is the difference between join () and merge () in pandas?

Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.

What is cross join in pandas?

A Cross Join is a type of join that allows you to produce a Cartesian Product of rows in two or more tables. In other words, it combines rows from a first table with each row from a second table. The following table illustrates the result of Cross Join , when joining the Table df1 to another Table df2.


1 Answers

Create a surrogate key to do a cartesian join between them...

import pandas as pd

df1 = pd.DataFrame({'A': [1, 0, 2],
                    'B': [2, 0, 1],
                    'tmp': 1})

df2 = pd.DataFrame({'l1': ['a', 'b'],
                    'l2': ['a', 'c'],
                    'tmp': 1})

print pd.merge(df1, df2, on='tmp', how='outer')

Result:

   A  B  tmp l1 l2 
0  1  2    1  a  a 
1  1  2    1  b  c 
2  0  0    1  a  a 
3  0  0    1  b  c 
4  2  1    1  a  a 
5  2  1    1  b  c
like image 134
Brian Pendleton Avatar answered Oct 20 '22 07:10

Brian Pendleton