Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unknown string format on pd.to_datetime

I have a data set with a column date like this:

cod           date              value 
0   1O8        2015-01-01 00:00:00    2.1
1   1O8        2015-01-01 01:00:00    2.3
2   1O8        2015-01-01 02:00:00    3.5
3   1O8        2015-01-01 03:00:00    4.5
4   1O8        2015-01-01 04:00:00    4.4
5   1O8        2015-01-01 05:00:00    3.2
6   1O9        2015-01-01 00:00:00    1.4
7   1O9        2015-01-01 01:00:00    8.6
8   1O9        2015-01-01 02:00:00    3.3
10  1O9        2015-01-01 03:00:00    1.5
11  1O9        2015-01-01 04:00:00    2.4
12  1O9        2015-01-01 05:00:00    7.2

The dtypes of column date is an object, for apply some function after I need to change the date column type to datatime. I try a diffrent solution like:

pd.to_datetime(df['date'], errors='raise', format ='%Y-%m-%d HH:mm:ss')
pd.to_datetime(df['date'], errors='coerce', format ='%Y-%m-%d HH:mm:ss')
df['date'].apply(pd.to_datetime, format ='%Y-%m-%d HH:mm:ss')

But the error is only the same:

TypeError: Unrecognized value type: <class 'str'>
ValueError: Unknown string format

The straight thing is that if I apply te function to a sample of data set, the function respond correctly, but if I apply it to all data set exit the error. In the data there isn missing value and the dtype is the same for all value.

How I can fix this error?

like image 840
jjgasse Avatar asked Nov 29 '18 10:11

jjgasse


1 Answers

There are three issues:

  1. pd.to_datetime and pd.Series.apply don't work in place, so your solutions won't modify your series. Assign back after conversion.
  2. Your third solution needs errors='coerce' to guarantee no errors.
  3. For the time component you need to use specific string formats beginning with %.

So you can use:

df = pd.DataFrame({'date': ['2015-01-01 00:00:00', '2016-12-20 15:00:20',
                            '2017-08-05 00:05:00', '2018-05-11 00:10:00']})

df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%Y-%m-%d %H:%M:%S')

print(df)

                  date
0  2015-01-01 00:00:00
1  2016-12-20 15:00:20
2  2017-08-05 00:05:00
3  2018-05-11 00:10:00

In this particular instance, the format is standard and can be omitted:

df['date'] = pd.to_datetime(df['date'], errors='coerce')
like image 159
jpp Avatar answered Nov 19 '22 17:11

jpp