Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making columns and ordering consistent in a Pandas DataFrame

I am looking for the elegant, Pythonic way of making a Pandas DataFrame columns consistent. Meaning:

  1. Ensure all the columns in a master list are present, and if not add in an empty placeholder column.
  2. Ensure that the columns are in the same order as the master list.

I have the following example that works, but is there a built-in Pandas method for accomplishing the same goal?

import pandas as pd
df1 = pd.DataFrame(data=[{'a':1,'b':32, 'c':32}])
print df1
   a   b   c
0  1  32  32
column_master_list = ['b', 'c', 'e', 'd', 'a']
def get_dataframe_with_consistent_header(df, headers):
    for col in headers:
        if col not in df.columns:
            df[col] = pd.np.NaN
    return df[headers]

print get_dataframe_with_consistent_header(df1, column_master_list)
   b   c   e   d   a
0 32  32 NaN NaN   1
like image 458
skulz00 Avatar asked Nov 11 '14 14:11

skulz00


1 Answers

You can use the reindex method. Pass in the list of column names and specify 'columns'. The fill value for missing entries is NaN by default:

>>> df1.reindex(column_master_list, axis='columns')
    b   c   e   d  a
0  32  32 NaN NaN  1
like image 169
Alex Riley Avatar answered Sep 24 '22 01:09

Alex Riley