I want to combine two dataframes
. One dataframe
, let's say Empty_DF
, is empty and has big size (320 columns by 240 rows) with indexes and column names just integers. The other one,ROI_DF
, is smaller and filled and matches at a certain location the indexes and column names.
I have tried to use the pandas.merge
function as it was suggested in this question; however, it would just append the columns to the empty dataframe
Empty_DF
and not replacing the values.
Empty_DF = pd.DataFrame({'a':[0,0,0,0,0,0],
'b':[0,0,0,0,0,0], 'b':[0,0,0,0,0,0]}, index=list('abcdef'))
print (Empty_DF)
ROI_DF= pd.DataFrame({'a':range(4),
'b':[5,6,7,8]}, index=list('abce'))
print(ROI_DF)
a b c
a 0 0 0
b 0 0 0
c 0 0 0
d 0 0 0
e 0 0 0
f 0 0 0
In this example, it is sufficient since the dataframe
is small and the pandas.fillna
option with pandas.drop can be used. Is there a more efficient way of optimizing this to bigger dataframes
?
df3 = pd.merge(Empty_DF, ROI_DF, how='left', left_index=True,
right_index=True, suffixes=('_x', ''))
df3['a'].fillna(df3['a_x'], inplace=True)
df3['b'].fillna(df3['b_x'], inplace=True)
df3.drop(['a_x', 'b_x'], axis=1, inplace=True)
print(df3)
a b c
a 0 5 0
b 1 6 0
c 2 7 0
d 0 0 0
e 3 8 0
f 0 0 0
Joining DataFrames Another way to combine DataFrames is to use columns in each dataset that contain common values (a common unique id). Combining DataFrames using a common field is called “joining”. The columns containing the common values are called “join key(s)”.
This is perfect case for DataFrame.update
, which aligns on indices
Empty_DF.update(ROI_DF)
Output
print(df3)
a b c
a 0.0 5.0 0
b 1.0 6.0 0
c 2.0 7.0 0
d 0.0 0.0 0
e 3.0 8.0 0
f 0.0 0.0 0
Note that update
is in place, as quoted from the documentation:
Modify in place using non-NA values from another DataFrame.
That means that your original dataframe will be updated by the new values. To prevent this, use:
df3 = Empty_DF.copy()
df3.update(ROI_DF)
You can either use update
:
Empty_DF.update(ROI_DF)
output:
a b c
a 0.0 5.0 0
b 1.0 6.0 0
c 2.0 7.0 0
d 0.0 0.0 0
e 3.0 8.0 0
f 0.0 0.0 0
Or loc
:
Empty_DF.loc[ROI_DF.index, ROI_DF.columns] = ROI_DF
output:
a b c
a 0 5 0
b 1 6 0
c 2 7 0
d 0 0 0
e 3 8 0
f 0 0 0
In your case reindex_like
yourdf=ROI_DF.reindex_like(Empty_DF).fillna(0)
a b c
a 0.0 5.0 0.0
b 1.0 6.0 0.0
c 2.0 7.0 0.0
d 0.0 0.0 0.0
e 3.0 8.0 0.0
f 0.0 0.0 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With