Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to cast values to their respective datatypes in Python

I have a list of values - all strings. I want to convert these values to their respective datatypes. I have mapping of values to the types information available.

There are three different datatypes: int, str, datetime. The code needs to be able to handle the error cases with the data.

I am doing something like:-

tlist =  [ 'some datetime value', '12', 'string', .... ]

#convert it to: [ datetime object, 12, 'string', ....]

error_data = ['', ' ', '?', ...]

d = { 0: lambda x: datetime.strptime(x,...) if x not in error_data else x, 
      1: lambda x: int(x) if x not in error_data else 0,
      2: lambda x: x 
      ...
     }

result = [ d[i](j) for i, j in enumerate(tlist) ]

The list to convert is very long, like 180 values and I need to do it for thousands of such lists. The performance of above code is very poor. What is the fastest way to do it?

Thank you

like image 498
Sujit Avatar asked Sep 09 '11 15:09

Sujit


1 Answers

If your datetime value is always consistant why not let the type casting handle the invalid data that you're trying to manage in error_data. This is not as sexy as some solutions but makes managing type conversion based on position of data in list a little easier to maintain and expand upon.

def convert(position, val):
    if position == 0:
        try:
            return datetime.strptime(val, '%Y-%m-%d %H:%M:%S') # assuming date is in a constant format
        except ValueError:
            return val
    elif position in (1, 15, 16): # assuming that you have other int values in other "columns"
        try:
            return int(val)
        except ValueError:
            return 0
    else: # string type
       return val

result = [convert(i,j) for i, j in enumerate(tlist)]
like image 185
Philip Southam Avatar answered Nov 15 '22 03:11

Philip Southam