I have a list of dataframes
dfA:
item   a     A              
A      1     2 
B      1     3         
C      0     4         
dfB:
item   a     B
E      1     2
F      0     6
dfC:
item   a     C
G      1     3
H      0     4
I want to add the missing columns to each dataframe
This is what I want : dfA:
item   a     A    B    C           
A      1     2    0    0
B      1     3    0    0 
C      0     4    0    0 
dfB:
item   a     A   B    C
E      1     0   2    0
F      0     0   6    0
dfC:
item   a     A   B   C
G      1     0   0   3
H      0     0   0   4
                1) Take the union of each dataframe's columns.
col_list = list(set().union(dfA.columns, dfB.columns, dfC.columns))
col_list.sort()
['A', 'B', 'C', 'a']
2) Use the reindex function.
dfA2 = dfA.reindex(columns=col_list, fill_value=0)
   A  B  C  a
A  2  0  0  1
B  3  0  0  1
C  4  0  0  0
dfB2 = dfB.reindex(columns=col_list, fill_value=0)
   A  B  C  a
E  0  2  0  1
F  0  6  0  0
dfC2 = dfC.reindex(columns=col_list, fill_value=0)
   A  B  C  a
G  0  0  3  1
H  0  0  4  0
3) You can use reindex to drop, add, or duplicate columns.
dfA3 = dfA.reindex(columns=['C', 'A', 'A', 'D'], fill_value=0)
   C  A  A  D
A  0  2  2  0
B  0  3  3  0
C  0  4  4  0
                        You can create a combined column list like this
col_list = (df1.append([df2,df3])).columns.tolist()
Now add the columns to each dataframe
df1 = df1.loc[:, col_list].fillna(0)
print(df1)
    A   B   C   a   item
0   2   0.0 0.0 1   A
1   3   0.0 0.0 1   B
2   4   0.0 0.0 0   C
df2 = df2.loc[:, col_list].fillna(0)
print(df2)
    A   B   C   a   item
0   0.0 2   0.0 1   E
1   0.0 6   0.0 0   F
df3 = df3.loc[:, col_list].fillna(0)
print(df3)
    A   B   C   a   item
0   0.0 0.0 3   1   G
1   0.0 0.0 4   0   H
                        Option 1
Align both axes
With functools.partial  
from functool import partial
(_, dfA), (dfC, dfB) = list(map(
    partial(dfC.align, fill_value=0),
    dfA.align(dfB, fill_value=0)
))
Option 1B
Align columns only  
from functools import partial
(_, dfA), (dfC, dfB) = list(map(
    partial(dfC.align, fill_value=0, axis=1),
    dfA.align(dfB, fill_value=0, axis=1)
))
Option 2
Align both axes
With pd.DataFrame.reindex  
from functools import reduce    
lod = [dfA, dfB, dfC]
idx = reduce(pd.Index.union, (d.index for d in lod))
col = reduce(pd.Index.union, (d.columns for d in lod))
dfA, dfB, dfC = (d.reindex(idx, col, fill_value=0) for d in lod)
Option 2B
Align columns only  
lod = [dfA, dfB, dfC]
col = reduce(pd.Index.union, (d.columns for d in lod))
dfA, dfB, dfC = (d.reindex(columns=col, fill_value=0) for d in lod)
Setup
dfA = pd.DataFrame(**{
    'columns': ['item', 'a', 'A'],
    'data': [['A', 1, 2], ['B', 1, 3], ['C', 0, 4]],
    'index': [0, 1, 2]})
dfB = pd.DataFrame(**{
    'columns': ['item', 'a', 'B'],
    'data': [['E', 1, 2], ['F', 0, 6]],
    'index': [0, 1]})
dfC = pd.DataFrame(**{
    'columns': ['item', 'a', 'C'],
    'data': [['G', 1, 3], ['H', 0, 4]],
    'index': [0, 1]})
                        One way using merge by defining the order of dfA, dfB, dfC in reduce operation.
In [1932]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
                              [dfA, dfB, dfC]).fillna(0)
Out[1932]:
  item  a  A    B    C
0    A  1  2  0.0  0.0
1    B  1  3  0.0  0.0
2    C  0  4  0.0  0.0
In [1933]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'), 
                  [dfB, dfA, dfC]).fillna(0)
Out[1933]:
  item  a  B    A    C
0    E  1  2  0.0  0.0
1    F  0  6  0.0  0.0
In [1934]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
                  [dfC, dfA, dfB]).fillna(0)
Out[1934]:
  item  a  C    A    B
0    G  1  3  0.0  0.0
1    H  0  4  0.0  0.0
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With