I am looking for the elegant, Pythonic way of making a Pandas DataFrame columns consistent. Meaning:
I have the following example that works, but is there a built-in Pandas method for accomplishing the same goal?
import pandas as pd
df1 = pd.DataFrame(data=[{'a':1,'b':32, 'c':32}])
print df1
a b c 0 1 32 32
column_master_list = ['b', 'c', 'e', 'd', 'a']
def get_dataframe_with_consistent_header(df, headers):
for col in headers:
if col not in df.columns:
df[col] = pd.np.NaN
return df[headers]
print get_dataframe_with_consistent_header(df1, column_master_list)
b c e d a 0 32 32 NaN NaN 1
You can use the reindex
method. Pass in the list of column names and specify 'columns'
. The fill value for missing entries is NaN
by default:
>>> df1.reindex(column_master_list, axis='columns')
b c e d a
0 32 32 NaN NaN 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With