Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling multiple datetime formats with pd.to_datetime

I have a datatime data, their format is like 29062017 and 01AUG2017. As you can see, the month is in the middle of data.

I want to convert this data to datetime, when I use pd.to_datetime, but it doesn't work.

Do you know a good way to solve this problem?

like image 856
yuchen huang Avatar asked Nov 13 '17 02:11

yuchen huang


2 Answers

You can use pd.to_datetime's format arg:

In [11]: s = pd.Series(["29062017", "01AUG2017"])

In [12]: pd.to_datetime(s, format="%d%m%Y", errors="coerce")
Out[12]:
0   2017-06-29
1          NaT
dtype: datetime64[ns]

In [13]: pd.to_datetime(s, format="%d%b%Y", errors="coerce")
Out[13]:
0          NaT
1   2017-08-01
dtype: datetime64[ns]

Note: the coerce argument means that failures will be NaT.

and fill in the NaNs from one into the other e.g. using fillna:

In [14]: pd.to_datetime(s, format="%d%m%Y", errors="coerce").fillna(pd.to_datetime(s, format="%d%b%Y", errors="coerce"))
Out[14]:
0   2017-06-29
1   2017-08-01
dtype: datetime64[ns]

Any strings that don't match either format will remain NaT.

like image 136
Andy Hayden Avatar answered Nov 15 '22 22:11

Andy Hayden


The alternative would be to use a mapper and replace to substitute month codes with their numerical equivalent:

s = pd.Series(["29062017", "01AUG2017"]); s

0     29062017
1    01AUG2017
dtype: object

m = {'JAN' : '01', ..., 'AUG' : '08', ...}  # you fill in the rest

s = s.replace(m, regex=True); s

0    29062017
1    01082017
dtype: object

Now all you need is a single pd.to_datetime call:

pd.to_datetime(s, format="%d%m%Y", errors="coerce")

0   2017-06-29
1   2017-08-01
dtype: datetime64[ns]
like image 41
cs95 Avatar answered Nov 15 '22 22:11

cs95