I have the following df
list_columns = ['A', 'B', 'C']
list_data = [
[1, '2', 3],
[4, '4', 5],
[1, '2', 3],
[4, '4', 6]
]
df = pd.DataFrame(columns=list_columns, data=list_data)
I want to check if multiple columns exist, and if not to create them.
Example: If B,C,D do not exist, create them(For the above df it will create only D column) I know how to do this with one column:
if 'D' not in df:
df['D']=0
Is there a way to test if all my columns exist, and if not create the one that are missing? And not to make an if for each column
Here loop is not necessary - use DataFrame.reindex
with Index.union
:
cols = ['B','C','D']
df = df.reindex(df.columns.union(cols, sort=False), axis=1, fill_value=0)
print (df)
A B C D
0 1 2 3 0
1 4 4 5 0
2 1 2 3 0
3 4 4 6 0
Just to add, you can unpack the set diff
between your columns and the list with an assign
and **
unpacking.
import numpy as np
cols = ['B','C','D','E']
df.assign(**{col : 0 for col in np.setdiff1d(cols,df.columns.values)})
A B C D E
0 1 2 3 0 0
1 4 4 5 0 0
2 1 2 3 0 0
3 4 4 6 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With