Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.to_datetime inconsistent time string format

I am attempting to convert the index of a pandas.DataFrame from string format to a datetime index, using pandas.to_datetime().

Import pandas:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.10.1'

Create an example DataFrame:

In [3]: d = {'data' : pd.Series([1.,2.], index=['26/12/2012', '10/01/2013'])}

In [4]: df=pd.DataFrame(d)

Look at indices. Note that the date format is day/month/year:

In [5]: df.index
Out[5]: Index([26/12/2012, 10/01/2013], dtype=object)

Convert index to datetime:

In [6]: pd.to_datetime(df.index)
Out[6]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-10-01 00:00:00]
Length: 2, Freq: None, Timezone: None

Already at this stage, you can see that the date format for each entry has been formatted differently. The first is fine, the second has swapped month and day.

This is what I want to write, but avoiding the inconsistent formatting of date strings:

In [7]: df.set_index(pd.to_datetime(df.index))
Out[7]: 
data
2012-12-26   1
2013-10-01   2

I guess the first entry is correct because the function 'knows' there aren't 26 months, and so does not choose the default month/day/year format.

Is there another/better way to do this? Can I pass the format into the to_datetime() function?

Thank you.

EDIT:

I have found a way to do this, without pandas.to_datetime:

import datetime.datetime as dt
date_string_list = df.index.tolist()
datetime_list = [ dt.strptime(date_string_list[x], '%d/%m/%Y') for x in range(len(date_string_list)) ]
df.index=datetime_list

but it's a bit messy. Any improvements welcome.

like image 611
random.me Avatar asked Apr 10 '13 15:04

random.me


1 Answers

There are (hidden?) dayfirst argument to to_datetime:

In [23]: pd.to_datetime(df.index, dayfirst=True)
Out[23]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None

In pandas 0.11 (onwards) you'll be able to use the format argument:

In [24]: pd.to_datetime(df.index, format='%d/%m/%Y')
Out[24]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None
like image 63
Andy Hayden Avatar answered Sep 30 '22 15:09

Andy Hayden