I have the following data frames (in reality they are more than 3).
import pandas as pd
df1 = pd.DataFrame({'head1': ['foo', 'bix', 'bar'],'val': [11, 22, 32]})
df2 = pd.DataFrame({'head2': ['foo', 'xoo', 'bar','qux'],'val': [1, 2, 3,10]})
df3 = pd.DataFrame({'head3': ['xoo', 'bar',],'val': [20, 100]})
# Note that the value in column 'head' is always unique
What I want to do is to merge them based on head column. And whenever the value of a head does not exist in one data frame we would assign it with NA.
In the end it'll look like this:
head1 head2 head3
-------------------------------
foo 11 1 NA
bix 22 NA NA
bar 32 3 100
xoo NA 2 20
qux NA 10 NA
How can I achieve that using Pandas?
You can use pandas.concat selecting the axis=1 to concatenate your multiple DataFrames.
Note however that I've first set the index of the df1, df2, df3 to use the variables (foo, bar, etc) rather than the default integers.
import pandas as pd
df1 = pd.DataFrame({'head1': ['foo', 'bix', 'bar'],'val': [11, 22, 32]})
df2 = pd.DataFrame({'head2': ['foo', 'xoo', 'bar','qux'],'val': [1, 2, 3,10]})
df3 = pd.DataFrame({'head3': ['xoo', 'bar',],'val': [20, 100]})
df1 = df1.set_index('head1')
df2 = df2.set_index('head2')
df3 = df3.set_index('head3')
df = pd.concat([df1, df2, df3], axis = 1)
columns = ['head1', 'head2', 'head3']
df.columns = columns
print(df)
head1 head2 head3
bar 32 3 100
bix 22 NaN NaN
foo 11 1 NaN
qux NaN 10 NaN
xoo NaN 2 20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With