How to concatenate two frames with different number of columns in pandas?

Tags:

pandas

I have the following Dataframes:

Dataframe 1:

|---------------------|------------------|
|      property_id    |        beds      |
|---------------------|------------------|
|          1          |         1        |
|---------------------|------------------|
|          2          |         2        | 
|---------------------|------------------|

Dataframe 2:

|---------------------| 
|      property_id    |
|---------------------|
|          3          |
|---------------------|
|          4          |
|---------------------|

What I want to produce is the following Dataframe:

|---------------------|------------------|
|      property_id    |        beds      |
|---------------------|------------------|
|          1          |         1        |
|---------------------|------------------|
|          2          |         2        | 
|---------------------|------------------|
|          3          |         0        |
|---------------------|------------------|
|          4          |         0        | 
|---------------------|------------------|

What I want is to concatenate two Dataframes, and the former has more columns than the latter, but all the columns of the latter are in the former. When the column is not present in the latter dataframe I want to set a default value of 0. How can I achieve this?

df1 = pd.DataFrame({'property_id': [1, 2], 'beds': [1, 2]})
df2 = pd.DataFrame({'property_id': [3, 4]})

I have almost no experience with pandas, so what could I do?

875

asked Apr 24 '17 02:04

lmiguelvargasf

2 Answers

You can use pandas.concat or append method for this, both methods will generate NA for columns that don't exist in the sub data frame, to fill them with zero, you can use fillna(0):

df1.append(df2).fillna(0)

#  beds     property_id
#0  1.0          1
#1  2.0          2
#0  0.0          3
#1  0.0          4


pd.concat([df1, df2]).fillna(0)

#  beds     property_id
#0  1.0         1
#1  2.0         2
#0  0.0         3
#1  0.0         4

answered Oct 16 '22 21:10

Psidom

df1.append(df2.reindex_axis(df1.columns, 1, fill_value=0))

The advantage is that integer types should be preserved

answered Oct 16 '22 21:10

piRSquared

Related questions
                            
                                Visual Studio code and virtualenv
                            
                                Pandas - equivalent of str.contains() in pandas query
                            
                                Python print pdf file with win32print
                            
                                Delete pandas column with no name [duplicate]
                            
                                TypeError: str object is not an iterator
                            
                                extend a pandas datetimeindex by 1 period
                            
                                Django REST Framework Swagger - Authentication Error
                            
                                More arguments in derived class __init__ than base class __init__
                            
                                How to get random.sample() from deque in Python 3?
                            
                                float() argument must be a string or a number, not 'zip'
                            
                                "python" still runs the system version after virtualenv activate
                            
                                How to test file response in Django?
                            
                                Decompress and read Dukascopy .bi5 tick files
                            
                                what is the most efficient way in pyspark to reduce a dataframe?
                            
                                How many FLOPs does tanh need?
                            
                                Tensorflow: When are variable assignments done in sess.run with a list?
                            
                                Python Keras cross_val_score Error
                            
                                Crop a Bounding Box from an Image which is a Numpy Array
                            
                                LSTM-Keras Error: ValueError: non-broadcastable output operand with shape (67704,1) doesn't match the broadcast shape (67704,12)
                            
                                How to disable SSL verification for urlretrieve?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With