I have two pandas dataframes, df1 and df2. I want to create a dataframe df3 that contains all combinations using one column in df1 and one column in df2. The pseudocode of doing this inefficiently would be something like this:
df3 = []
for i in df1:
for j in df2:
df3.append(i + j) # where i + j is the row with the combined cols from df1 and df2
Here's the format for df1:
df1_id other_data_1 other_data_2
1 0 1
2 1 5
df2:
df2_id other_data_3 other_data_4
1 0 1
3 2 2
And the goal is to get this output for df3:
df1_id df2_id other_data_1 other_data_2 other_data_3 other_data_4
1 1 0 1 0 1
1 3 0 1 2 2
2 1 1 5 0 1
2 3 1 5 2 2
Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
R has an inbuilt function called merge which combines two dataframe of different lengths automatically.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Update pandas 1.2.0+
df1.merge(df2, how='cross')
Set a common key between the two dataframes and use pd.merge
:
df1['key'] = 1
df2['key'] = 1
Merge and drop key column:
df3 = pd.merge(df1,df2,on='key').drop('key',axis=1)
df3
Output:
df1_id other_data_1 other_data_2 df2_id other_data_3 other_data_4
0 1 0 1 1 0 1
1 1 0 1 3 2 2
2 2 1 5 1 0 1
3 2 1 5 3 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With