I have a list of dataframes
dfA:
item a A
A 1 2
B 1 3
C 0 4
dfB:
item a B
E 1 2
F 0 6
dfC:
item a C
G 1 3
H 0 4
I want to add the missing columns to each dataframe
This is what I want : dfA:
item a A B C
A 1 2 0 0
B 1 3 0 0
C 0 4 0 0
dfB:
item a A B C
E 1 0 2 0
F 0 0 6 0
dfC:
item a A B C
G 1 0 0 3
H 0 0 0 4
1) Take the union of each dataframe's columns.
col_list = list(set().union(dfA.columns, dfB.columns, dfC.columns))
col_list.sort()
['A', 'B', 'C', 'a']
2) Use the reindex function.
dfA2 = dfA.reindex(columns=col_list, fill_value=0)
A B C a
A 2 0 0 1
B 3 0 0 1
C 4 0 0 0
dfB2 = dfB.reindex(columns=col_list, fill_value=0)
A B C a
E 0 2 0 1
F 0 6 0 0
dfC2 = dfC.reindex(columns=col_list, fill_value=0)
A B C a
G 0 0 3 1
H 0 0 4 0
3) You can use reindex to drop, add, or duplicate columns.
dfA3 = dfA.reindex(columns=['C', 'A', 'A', 'D'], fill_value=0)
C A A D
A 0 2 2 0
B 0 3 3 0
C 0 4 4 0
You can create a combined column list like this
col_list = (df1.append([df2,df3])).columns.tolist()
Now add the columns to each dataframe
df1 = df1.loc[:, col_list].fillna(0)
print(df1)
A B C a item
0 2 0.0 0.0 1 A
1 3 0.0 0.0 1 B
2 4 0.0 0.0 0 C
df2 = df2.loc[:, col_list].fillna(0)
print(df2)
A B C a item
0 0.0 2 0.0 1 E
1 0.0 6 0.0 0 F
df3 = df3.loc[:, col_list].fillna(0)
print(df3)
A B C a item
0 0.0 0.0 3 1 G
1 0.0 0.0 4 0 H
Option 1
Align both axes
With functools.partial
from functool import partial
(_, dfA), (dfC, dfB) = list(map(
partial(dfC.align, fill_value=0),
dfA.align(dfB, fill_value=0)
))
Option 1B
Align columns only
from functools import partial
(_, dfA), (dfC, dfB) = list(map(
partial(dfC.align, fill_value=0, axis=1),
dfA.align(dfB, fill_value=0, axis=1)
))
Option 2
Align both axes
With pd.DataFrame.reindex
from functools import reduce
lod = [dfA, dfB, dfC]
idx = reduce(pd.Index.union, (d.index for d in lod))
col = reduce(pd.Index.union, (d.columns for d in lod))
dfA, dfB, dfC = (d.reindex(idx, col, fill_value=0) for d in lod)
Option 2B
Align columns only
lod = [dfA, dfB, dfC]
col = reduce(pd.Index.union, (d.columns for d in lod))
dfA, dfB, dfC = (d.reindex(columns=col, fill_value=0) for d in lod)
Setup
dfA = pd.DataFrame(**{
'columns': ['item', 'a', 'A'],
'data': [['A', 1, 2], ['B', 1, 3], ['C', 0, 4]],
'index': [0, 1, 2]})
dfB = pd.DataFrame(**{
'columns': ['item', 'a', 'B'],
'data': [['E', 1, 2], ['F', 0, 6]],
'index': [0, 1]})
dfC = pd.DataFrame(**{
'columns': ['item', 'a', 'C'],
'data': [['G', 1, 3], ['H', 0, 4]],
'index': [0, 1]})
One way using merge
by defining the order of dfA, dfB, dfC
in reduce
operation.
In [1932]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
[dfA, dfB, dfC]).fillna(0)
Out[1932]:
item a A B C
0 A 1 2 0.0 0.0
1 B 1 3 0.0 0.0
2 C 0 4 0.0 0.0
In [1933]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
[dfB, dfA, dfC]).fillna(0)
Out[1933]:
item a B A C
0 E 1 2 0.0 0.0
1 F 0 6 0.0 0.0
In [1934]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
[dfC, dfA, dfB]).fillna(0)
Out[1934]:
item a C A B
0 G 1 3 0.0 0.0
1 H 0 4 0.0 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With