Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas integer YYYYMMDD to datetime

Apologies in advance for this, but after two hours of searching and trying I cannot get the right answer here. I have a data frame, populated via pandas io sql.read_frame(). The column that is proving to be too much for me is of dtype int64. The integers is of the format YYYYMMDD. for example 20070530 - 30th of may 2007. I have tried a range of approaches, the most obvious being;

pd.to_datetime(dt['Date']) and pd.to_datetime(str(dt['Date']))

with multiple variations on the functions different parameters.

The result has been, at best, that the date interpreted as being the time. The date is set to 1970-01-01 - outcome as per above example 1970-01-01 00:00:00.020070530

I also tried various .map() functions found in simular posts.

I have noticed that according to np.date_range() can interpret string values of the format YYYYMMDD, but that is the closest I have come to seeing a solution.

If anyone has an answer, I would be very greatful!

EDIT: In view of the answer from Ed Chum, the problem is most likely related to encoding. rep() on a subset of the dataFrame yields:

OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070911\n7
29 20070918\n8
31 0070816\n9
34 20070925\n10

This is when LstInvDt is dtype int64.

like image 825
Rookie Avatar asked Dec 16 '14 14:12

Rookie


People also ask

How do you convert int to datetime in Python?

Use pandas. to_datetime() to Convert Integer to Date & Time Format. Let's suppose that your integers contain both the date and time. In that case, the format should be specify is '%Y%m%d%H%M%S' .


1 Answers

to_datetime accepts a format string:

In [92]:  t = 20070530 pd.to_datetime(str(t), format='%Y%m%d') Out[92]: Timestamp('2007-05-30 00:00:00') 

example:

In [94]:  t = 20070530 df = pd.DataFrame({'date':[t]*10}) df Out[94]:        date 0  20070530 1  20070530 2  20070530 3  20070530 4  20070530 5  20070530 6  20070530 7  20070530 8  20070530 9  20070530 In [98]:  df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d')) df Out[98]:        date   DateTime 0  20070530 2007-05-30 1  20070530 2007-05-30 2  20070530 2007-05-30 3  20070530 2007-05-30 4  20070530 2007-05-30 5  20070530 2007-05-30 6  20070530 2007-05-30 7  20070530 2007-05-30 8  20070530 2007-05-30 9  20070530 2007-05-30 In [99]:  df.dtypes Out[99]: date                 int64 DateTime    datetime64[ns] dtype: object 

EDIT

Actually it's quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:

In [102]:  df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d') df Out[102]:        date   DateTime 0  20070530 2007-05-30 1  20070530 2007-05-30 2  20070530 2007-05-30 3  20070530 2007-05-30 4  20070530 2007-05-30 5  20070530 2007-05-30 6  20070530 2007-05-30 7  20070530 2007-05-30 8  20070530 2007-05-30 9  20070530 2007-05-30 

timings

In [104]:  %timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))  100 loops, best of 3: 2.55 ms per loop In [105]:  %timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d') 1000 loops, best of 3: 396 µs per loop 
like image 139
EdChum Avatar answered Oct 02 '22 10:10

EdChum