Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change dataframe column names from string format to datetime

I have a dataframe where the names of the columns are dates (Year-month) in the form of strings. How can I convert these names in datetime format? I tried doing this:

new_cols = pd.to_datetime(df.columns)
df = df[new_cols]

but I get the error:

KeyError: "DatetimeIndex(
['2000-01-01', '2000-02-01',
 '2000-03-01', '2000-04-01',
 '2000-05-01', '2000-06-01', 
'2000-07-01', '2000-08-01',               
'2000-09-01', '2000-10-01',
'2015-11-01', '2015-12-01', 
'2016-01-01', '2016-02-01',
'2016-03-01', '2016-04-01', 
'2016-05-01', '2016-06-01',
'2016-07-01', '2016-08-01'],
dtype='datetime64[ns]', length=200, freq=None) not in index"

Thanks!

like image 581
gtroupis Avatar asked Jan 16 '17 13:01

gtroupis


2 Answers

If select by loc columns values was not changed, so get KeyError.

So you need assign output to columns:

df.columns = pd.to_datetime(df.columns)

Sample:

cols = ['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01']
vals = np.arange(5)
df = pd.DataFrame(columns = cols, data=[vals])
print (df)
   2000-01-01  2000-02-01  2000-03-01  2000-04-01  2000-05-01
0           0           1           2           3           4

print (df.columns)
Index(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01'], dtype='object')

df.columns = pd.to_datetime(df.columns)

print (df.columns)
DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01',
               '2000-05-01'],
              dtype='datetime64[ns]', freq=None)

Also is possible convert to period:

print (df.columns)
Index(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01'], dtype='object')

df.columns = pd.to_datetime(df.columns).to_period('M')

print (df.columns)
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05'],
             dtype='period[M]', freq='M')
like image 96
jezrael Avatar answered Sep 21 '22 08:09

jezrael


As an expansion to jezrael's answer, the original code will be trying to slice the df array by the array stored in new_cols and store the result as df - but since those values don't exist in df yet it returns an error saying it can't find that index to slice by.

As such you need to declare that you're changing the name of the columns, as in jezrael's answer.

like image 40
Fred Cascarini Avatar answered Sep 20 '22 08:09

Fred Cascarini