Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: unconverted data remains on Pandas DataFrame

Tags:

python

pandas

Data - Here's my data in a Pandas DataFrame

CallDateAndTimeStart
01/01/2010 00:26:28.003613 MST
01/01/2010 00:28:54.230713 MST
01/02/2008 14:12:11 MST
05/19/2010 09:12:32.080728 MST

My attempt to change column dtype to datetime64[ns]

df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
    format='%m/%d/%Y %H:%M:%S')

Error Message - Without cleaning the data, I get the following error:

File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 308, in _convert_listlike raise e
ValueError: unconverted data remains: .003613 MST

Question

How would I correct my dataframe column so that it can convert to a datetime type? I posted my answer, but is there a better answer? Thanks.

like image 963
Will Avatar asked Mar 18 '14 21:03

Will


3 Answers

You received an error since you didn't define the format of the microseconds and the timezone.

If all the rows were in the same format, the correct format will be:

df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
format='%m/%d/%Y %H:%M:%S.%f %Z') 

Since not all the rows are in the same format, the best way is to let pandas infer the format without declaring it:

df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'])

The output:

        CallDateAndTimeStart
0 2010-01-01 00:26:28.003613
1 2010-01-01 00:28:54.230713
2 2008-01-02 14:12:11.000000
3 2010-05-19 09:12:32.080728

Notice that in this solution the time zone is being ignored since MST is not recognized but you can convert the datetime object to the correct time zone with tz_convert. Also, if you are not intrested in the microseconds, you can easily round it once it is a datetime object:

df['CallDateAndTimeStartRounded'] = df['CallDateAndTimeStart'].dt.floor('s')

The output:

        CallDateAndTimeStart CallDateAndTimeStartRounded
0 2010-01-01 00:26:28.003613         2010-01-01 00:26:28
1 2010-01-01 00:28:54.230713         2010-01-01 00:28:54
2 2008-01-02 14:12:11.000000         2008-01-02 14:12:11
3 2010-05-19 09:12:32.080728         2010-05-19 09:12:32
like image 61
Morit Avatar answered Oct 14 '22 13:10

Morit


Code

I apply a custom function on the DataFrame column (convert_time)

df['CallDateAndTimeStart'] = df['CallDateAndTimeStart'].apply(convert_time)

def convert_time(mytime):
""" Fix DateTime by removing details after . and timezones """

    # Remove on period and after
    try:
        mytime = str(mytime).split(".")[0]
    except ValueError:
        print "Not able to split ."

    # Remove Timeframe (E.g. MST)
    mytime = str(mytime).split(" ")[0] + " " + str(mytime).split(" ")[1]

return mytime

df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
    format='%m/%d/%Y %H:%M:%S')

Output

CallDateAndTimeStart
2010-01-01 00:26:28
2010-01-01 00:28:54
2010-05-19 09:12:32
2008-01-02 14:12:11
2010-01-01 00:39:41
like image 39
Will Avatar answered Oct 14 '22 12:10

Will


I encounterd the same question and I used a same way like you to solve it.(apply a function to remove the unnecessary data)

I guess you could use the standard interface to avoid this issue:

>>> now = time.time() # get current time in second
>>> now_format = time.ctime(now) # get formatted time, like 'Thu May 21 17:43:46 2015'

and then use time.strptime() to get a standard time struct:

>>> standard_time_struct = time.strptime(now_format,"%a %B %d %X %Y")

you can get the final result like this:

>>> standard_time_struct
>>> time.struct_time(tm_year=2015, tm_mon=5, tm_mday=21, tm_hour=17, tm_min=49, tm_sec=10, tm_wday=3, tm_yday=141, tm_isdst=-1)
like image 36
Rogim Avatar answered Oct 14 '22 13:10

Rogim