Combine two pandas Data Frames (join on a common column)

Tags:

I have 2 dataframes:

restaurant_ids_dataframe

Data columns (total 13 columns): business_id      4503  non-null values categories       4503  non-null values city             4503  non-null values full_address     4503  non-null values latitude         4503  non-null values longitude        4503  non-null values name             4503  non-null values neighborhoods    4503  non-null values open             4503  non-null values review_count     4503  non-null values stars            4503  non-null values state            4503  non-null values type             4503  non-null values dtypes: bool(1), float64(3), int64(1), object(8)`

and

restaurant_review_frame

Int64Index: 158430 entries, 0 to 229905 Data columns (total 8 columns): business_id    158430  non-null values date           158430  non-null values review_id      158430  non-null values stars          158430  non-null values text           158430  non-null values type           158430  non-null values user_id        158430  non-null values votes          158430  non-null values dtypes: int64(1), object(7)

I would like to join these two DataFrames to make them into a single dataframe using the DataFrame.join() command in pandas.

I have tried the following line of code:

#the following line of code creates a left join of restaurant_ids_frame and   restaurant_review_frame on the column 'business_id' restaurant_review_frame.join(other=restaurant_ids_dataframe,on='business_id',how='left')

But when I try this I get the following error:

Exception: columns overlap: Index([business_id, stars, type], dtype=object)

I am very new to pandas and have no clue what I am doing wrong as far as executing the join statement is concerned.

any help would be much appreciated.

490

asked Sep 13 '13 18:09

anonuser0428

2 Answers

You can use merge to combine two dataframes into one:

import pandas as pd pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')

where on specifies field name that exists in both dataframes to join on, and how defines whether its inner/outer/left/right join, with outer using 'union of keys from both frames (SQL: full outer join).' Since you have 'star' column in both dataframes, this by default will create two columns star_x and star_y in the combined dataframe. As @DanAllan mentioned for the join method, you can modify the suffixes for merge by passing it as a kwarg. Default is suffixes=('_x', '_y'). if you wanted to do something like star_restaurant_id and star_restaurant_review, you can do:

 pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer', suffixes=('_restaurant_id', '_restaurant_review'))

The parameters are explained in detail in this link.

124

answered Sep 28 '22 06:09

mlimb

Joining fails if the DataFrames have some column names in common. The simplest way around it is to include an lsuffix or rsuffix keyword like so:

restaurant_review_frame.join(restaurant_ids_dataframe, on='business_id', how='left', lsuffix="_review")

This way, the columns have distinct names. The documentation addresses this very problem.

Or, you could get around this by simply deleting the offending columns before you join. If, for example, the stars in restaurant_ids_dataframe are redundant to the stars in restaurant_review_frame, you could del restaurant_ids_dataframe['stars'].

answered Sep 28 '22 06:09

Dan Allan

Related questions
                            
                                What is the __dict__.__dict__ attribute of a Python class?
                            
                                Any gotchas using unicode_literals in Python 2.6?
                            
                                How to use requirements.txt to install all dependencies in a python project
                            
                                Weird Try-Except-Else-Finally behavior with Return statements
                            
                                Django filter many-to-many with contains
                            
                                How to get folder name, in which given file resides, from pathlib.path?
                            
                                Prevent pandas from interpreting 'NA' as NaN in a string
                            
                                How to read a Parquet file into Pandas DataFrame?
                            
                                In python, how to import filename starts with a number
                            
                                Python: using a recursive algorithm as a generator
                            
                                Understanding lambda in python and using it to pass multiple arguments
                            
                                Parsing non-zero padded timestamps in Python
                            
                                Full examples of using pySerial package [closed]
                            
                                Python, what's the Enum type good for? [duplicate]
                            
                                Implementing use of 'with object() as f' in custom class in python
                            
                                How to locate and insert a value in a text box (input) using Python Selenium?
                            
                                Python Pandas: Convert ".value_counts" output to dataframe
                            
                                RuntimeError: This event loop is already running in python
                            
                                `if key in dict` vs. `try/except` - which is more readable idiom?
                            
                                Pythonic type hints with pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combine two pandas Data Frames (join on a common column)

Tags:

python

merge

pandas

dataframe

left-join

anonuser0428

People also ask

2 Answers

mlimb

Dan Allan

Recent Activity

Donate For Us