I have a series of CSVs with a column containing a Python datetime
-formatted string. Whilst parsing the CSV files (which could be tens of thousands of rows long), I want the date column to be converted from a string to an actual datetime
object.
An example CSV row:
['0', '(2011, 12, 11, 15, 45, 20)', 'Arduino/libraries/dallas-temperature-control/'],
As you can see, the date is represented in the CSV in datetime
format, but as a string.
I am looking for a fast way to build the datetime
object without resorting to running it through datetime.strptime(row[1], "(%Y, %m, %d, %H, %M, %S)")
- it seems counter-intuitive to have to interpret the date with strptime
when it's ready to drop in as-is.
You can use ast.literal_eval
to convert the string to a tuple of integers:
>>> import ast
>>> ast.literal_eval('(2011, 12, 11, 15, 45, 20)')
(2011, 12, 11, 15, 45, 20)
You can then unpack this (see e.g. What does ** (double star) and * (star) do for parameters?) straight into the datetime
constructor:
>>> import datetime
>>> datetime.datetime(*ast.literal_eval('(2011, 12, 11, 15, 45, 20)'))
datetime.datetime(2011, 12, 11, 15, 45, 20)
Like @jonrhsarpe has said in his answer, you can use ast.literal_eval
to convert the string to a tuple and then unpack it into the string.
But based on the following tests, it seems like the faster method would still be to use datetime.datetime.strptime()
. Example -
Code -
import datetime
import ast
def func1(datestring):
return datetime.datetime(*ast.literal_eval(datestring))
def func2(datestring):
return datetime.datetime.strptime(datestring, '(%Y, %m, %d, %H, %M, %S)')
Timing information -
In [39]: %timeit func1("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 30.1 µs per loop
In [40]: %timeit func2("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 26.9 µs per loop
In [41]: %timeit func1("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 38.6 µs per loop
In [42]: %timeit func2("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 28.8 µs per loop
In [43]: %timeit func1("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 31.2 µs per loop
In [44]: %timeit func2("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 29.5 µs per loop
In [45]: %timeit func1("(2011, 12, 11, 15, 45, 20)")
The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 32.6 µs per loop
In [46]: %timeit func2("(2011, 12, 11, 15, 45, 20)")
The slowest run took 15.42 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 27.5 µs per loop
In [47]: %timeit func1("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 49.2 µs per loop
In [48]: %timeit func2("(2011, 12, 11, 15, 45, 20)")
10000 loops, best of 3: 24.4 µs per loop
Not sure, where you got the information that datetime.datetime.strptime()
is counter-intuitive, but I would say for parsing strings to datetime objects, you should use strptime()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With