I have 2 dataframes:
restaurant_ids_dataframe
Data columns (total 13 columns): business_id 4503 non-null values categories 4503 non-null values city 4503 non-null values full_address 4503 non-null values latitude 4503 non-null values longitude 4503 non-null values name 4503 non-null values neighborhoods 4503 non-null values open 4503 non-null values review_count 4503 non-null values stars 4503 non-null values state 4503 non-null values type 4503 non-null values dtypes: bool(1), float64(3), int64(1), object(8)`
and
restaurant_review_frame
Int64Index: 158430 entries, 0 to 229905 Data columns (total 8 columns): business_id 158430 non-null values date 158430 non-null values review_id 158430 non-null values stars 158430 non-null values text 158430 non-null values type 158430 non-null values user_id 158430 non-null values votes 158430 non-null values dtypes: int64(1), object(7)
I would like to join these two DataFrames to make them into a single dataframe using the DataFrame.join() command in pandas.
I have tried the following line of code:
#the following line of code creates a left join of restaurant_ids_frame and restaurant_review_frame on the column 'business_id' restaurant_review_frame.join(other=restaurant_ids_dataframe,on='business_id',how='left')
But when I try this I get the following error:
Exception: columns overlap: Index([business_id, stars, type], dtype=object)
I am very new to pandas and have no clue what I am doing wrong as far as executing the join statement is concerned.
any help would be much appreciated.
Both dataframes has the different number of values but only common values in both the dataframes are displayed after merge. Example 2: In the resultant dataframe Grade column of df2 is merged with df1 based on key column Name with merge type left i.e. all the values of left dataframe (df1) will be displayed.
We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.
It is possible to join the different columns is using concat() method.
You can use merge to combine two dataframes into one:
import pandas as pd pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')
where on specifies field name that exists in both dataframes to join on, and how defines whether its inner/outer/left/right join, with outer using 'union of keys from both frames (SQL: full outer join).' Since you have 'star' column in both dataframes, this by default will create two columns star_x and star_y in the combined dataframe. As @DanAllan mentioned for the join method, you can modify the suffixes for merge by passing it as a kwarg. Default is suffixes=('_x', '_y')
. if you wanted to do something like star_restaurant_id
and star_restaurant_review
, you can do:
pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer', suffixes=('_restaurant_id', '_restaurant_review'))
The parameters are explained in detail in this link.
Joining fails if the DataFrames have some column names in common. The simplest way around it is to include an lsuffix
or rsuffix
keyword like so:
restaurant_review_frame.join(restaurant_ids_dataframe, on='business_id', how='left', lsuffix="_review")
This way, the columns have distinct names. The documentation addresses this very problem.
Or, you could get around this by simply deleting the offending columns before you join. If, for example, the stars in restaurant_ids_dataframe
are redundant to the stars in restaurant_review_frame
, you could del restaurant_ids_dataframe['stars']
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With