Goal here is to find the columns that does not exist in df and create them with null values.
I have a list of column names like below:
column_list = ('column_1', 'column_2', 'column_3')
When I try to check if the column exists, it gives out True for only columns that exist and do not get False for those that are missing.
for column in column_list:
print df.columns.isin(column_list).any()
In PySpark, I can achieve this using the below:
for column in column_list:
if not column in df.columns:
df = df.withColumn(column, lit(''))
How can I achieve the same using Pandas?
Here is how I would approach:
import numpy as np
for col in column_list:
if col not in df.columns:
df[col] = np.nan
Using np.isin
, assign
and unpacking kwargs
s = np.isin(column_list, df.columns)
df = df.assign(**{k:None for k in np.array(column_list)[~s]})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With