Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create combination of two pandas dataframes in two dimensions

Tags:

python

pandas

I have two pandas dataframes, df1 and df2. I want to create a dataframe df3 that contains all combinations using one column in df1 and one column in df2. The pseudocode of doing this inefficiently would be something like this:

df3 = []
for i in df1:
     for j in df2:
         df3.append(i + j) # where i + j is the row with the combined cols from df1 and df2

Here's the format for df1:

df1_id    other_data_1    other_data_2
1         0               1
2         1               5

df2:

df2_id    other_data_3    other_data_4
1         0               1
3         2               2

And the goal is to get this output for df3:

df1_id    df2_id    other_data_1    other_data_2    other_data_3    other_data_4
1         1         0               1               0               1
1         3         0               1               2               2
2         1         1               5               0               1
2         3         1               5               2               2
like image 690
Andrew Ng Avatar asked Apr 06 '17 15:04

Andrew Ng


People also ask

Can you combine two DataFrames in pandas?

Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.

Can you merge DataFrames of different sizes?

R has an inbuilt function called merge which combines two dataframe of different lengths automatically.

Can you concat two DataFrames with different columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


1 Answers

Update pandas 1.2.0+

df1.merge(df2, how='cross')

Set a common key between the two dataframes and use pd.merge:

df1['key'] = 1
df2['key'] = 1

Merge and drop key column:

df3 = pd.merge(df1,df2,on='key').drop('key',axis=1)
df3

Output:

   df1_id  other_data_1  other_data_2  df2_id  other_data_3  other_data_4
0       1             0             1       1             0             1
1       1             0             1       3             2             2
2       2             1             5       1             0             1
3       2             1             5       3             2             2
like image 148
Scott Boston Avatar answered Sep 21 '22 17:09

Scott Boston