In SQL, you can join two tables without a key so that all records of both tables merge with each other. If pandas.concat()
or pandas.merge()
or some other pandas syntax supported this, it could help me with one step of a problem I am trying to solve. I found an outer join option on the help documentation, but I could not find an exact syntax to do what I wanted (join all records without a key).
To explain this a little better:
import pandas as pd
lunchmenupairs2 = [["pizza", "italian"],["lasagna", "italian"],["orange", "fruit"]]
teamcuisinepreferences2 = [["ian", "*"]]
lunchLabels = ["Food", "Type"]
teamLabels = ["Person", "Type"]
df1 = pd.DataFrame.from_records(lunchmenupairs2, columns=lunchLabels)
df2 = pd.DataFrame.from_records(teamcuisinepreferences2, columns=teamLabels)
print(df1)
print(df2)
Outputs these tables:
Food Type
0 pizza italian
1 lasagna italian
2 orange fruit
Person Type
0 ian *
I want the final result of the merge to be:
Person Type Food Type
0 ian * pizza italian
1 ian * lasagna italian
2 ian * orange fruit
Then I can easily drop the columns I don't want and move to the next step in the code I am working on. This doesn't work:
merged_data = pd.merge(left=df2,right=df1, how='outer')
Is there a way to do this type of DataFrame
merging?
3. Outer Join Using merge() Using merge() you can do merging by columns, merging by index, merging on multiple columns, and different join types. By default, it joins on all common columns that exist on both DataFrames and performs an inner join, to do an outer join use how param with outer value.
In pandas, SQL's GROUP BY operations are performed using the similarly named groupby() method. groupby() typically refers to a process where we'd like to split a dataset into groups, apply some function (typically aggregation) , and then combine the groups together.
Join DataFrames using their indexes. If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index. Another option to join using the key columns is to use the on parameter.
You can add a column to both dfs with a constant value,
>>>df1['joincol'] = 1
>>>df2['joincol'] = 1
>>>pd.merge(left=df2,right=df1, on='joincol', how='outer')
Person Type_x joincol Food Type_y
0 ian * 1 pizza italian
1 ian * 1 lasagna italian
2 ian * 1 orange fruit
then delete it afterward when you remove your other undesired columns.
This is possible with cross-join
, introduced in Pandas 1.2.0. Simply run:
df1.merge(df2, how='cross')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With