Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple insert columns if not exist pandas

Tags:

python

pandas

I have the following df

list_columns = ['A', 'B', 'C']
list_data = [
    [1, '2', 3],
    [4, '4', 5],
    [1, '2', 3],
    [4, '4', 6]
    ]
df = pd.DataFrame(columns=list_columns, data=list_data)

I want to check if multiple columns exist, and if not to create them.

Example: If B,C,D do not exist, create them(For the above df it will create only D column) I know how to do this with one column:

if 'D' not in df:
    df['D']=0

Is there a way to test if all my columns exist, and if not create the one that are missing? And not to make an if for each column

like image 954
Christian Avatar asked Jun 18 '20 12:06

Christian


2 Answers

Here loop is not necessary - use DataFrame.reindex with Index.union:

cols = ['B','C','D']

df = df.reindex(df.columns.union(cols, sort=False), axis=1, fill_value=0)
print (df)
   A  B  C  D
0  1  2  3  0
1  4  4  5  0
2  1  2  3  0
3  4  4  6  0
like image 191
jezrael Avatar answered Sep 25 '22 17:09

jezrael


Just to add, you can unpack the set diff between your columns and the list with an assign and ** unpacking.

import numpy as np
cols = ['B','C','D','E']

df.assign(**{col : 0 for col in np.setdiff1d(cols,df.columns.values)})

   A  B  C  D  E
0  1  2  3  0  0
1  4  4  5  0  0
2  1  2  3  0  0
3  4  4  6  0  0
like image 28
Umar.H Avatar answered Sep 23 '22 17:09

Umar.H