Data - Here's my data in a Pandas DataFrame
CallDateAndTimeStart
01/01/2010 00:26:28.003613 MST
01/01/2010 00:28:54.230713 MST
01/02/2008 14:12:11 MST
05/19/2010 09:12:32.080728 MST
My attempt to change column dtype to datetime64[ns]
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
format='%m/%d/%Y %H:%M:%S')
Error Message - Without cleaning the data, I get the following error:
File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 308, in _convert_listlike raise e
ValueError: unconverted data remains: .003613 MST
Question
How would I correct my dataframe column so that it can convert to a datetime type? I posted my answer, but is there a better answer? Thanks.
You received an error since you didn't define the format of the microseconds and the timezone.
If all the rows were in the same format, the correct format will be:
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
format='%m/%d/%Y %H:%M:%S.%f %Z')
Since not all the rows are in the same format, the best way is to let pandas infer the format without declaring it:
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'])
The output:
CallDateAndTimeStart
0 2010-01-01 00:26:28.003613
1 2010-01-01 00:28:54.230713
2 2008-01-02 14:12:11.000000
3 2010-05-19 09:12:32.080728
Notice that in this solution the time zone is being ignored since MST is not recognized but you can convert the datetime object to the correct time zone with tz_convert. Also, if you are not intrested in the microseconds, you can easily round it once it is a datetime object:
df['CallDateAndTimeStartRounded'] = df['CallDateAndTimeStart'].dt.floor('s')
The output:
CallDateAndTimeStart CallDateAndTimeStartRounded
0 2010-01-01 00:26:28.003613 2010-01-01 00:26:28
1 2010-01-01 00:28:54.230713 2010-01-01 00:28:54
2 2008-01-02 14:12:11.000000 2008-01-02 14:12:11
3 2010-05-19 09:12:32.080728 2010-05-19 09:12:32
Code
I apply a custom function on the DataFrame column (convert_time)
df['CallDateAndTimeStart'] = df['CallDateAndTimeStart'].apply(convert_time)
def convert_time(mytime):
""" Fix DateTime by removing details after . and timezones """
# Remove on period and after
try:
mytime = str(mytime).split(".")[0]
except ValueError:
print "Not able to split ."
# Remove Timeframe (E.g. MST)
mytime = str(mytime).split(" ")[0] + " " + str(mytime).split(" ")[1]
return mytime
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
format='%m/%d/%Y %H:%M:%S')
Output
CallDateAndTimeStart
2010-01-01 00:26:28
2010-01-01 00:28:54
2010-05-19 09:12:32
2008-01-02 14:12:11
2010-01-01 00:39:41
I encounterd the same question and I used a same way like you to solve it.(apply a function to remove the unnecessary data)
I guess you could use the standard interface to avoid this issue:
>>> now = time.time() # get current time in second
>>> now_format = time.ctime(now) # get formatted time, like 'Thu May 21 17:43:46 2015'
and then use time.strptime() to get a standard time struct:
>>> standard_time_struct = time.strptime(now_format,"%a %B %d %X %Y")
you can get the final result like this:
>>> standard_time_struct
>>> time.struct_time(tm_year=2015, tm_mon=5, tm_mday=21, tm_hour=17, tm_min=49, tm_sec=10, tm_wday=3, tm_yday=141, tm_isdst=-1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With