I have data files containing lists of strings representing ISO formatted dates. Currently, I am reading them in using: <pre class="prettyprint"><code>mydates = [ datetime.datetime.strptime(timdata[x], "%Y-%m-%dT%H:%M:%S") for x in range(len(timedata)) ] </code></pre> This looks quite straightforward, but is ridiculously slow when operating on huge lists of ~25000 dates -> about 0.34 seconds per converted list. Since I have thousands of such lists I am looking for a faster way. However, I could not find one yet. The dateutil parser performs even worse...

Here is a way to do it about 3x faster. The original version: <pre class="prettyprint"><code>In [23]: %timeit datetime.datetime.strptime("2013-01-01T01:23:45", "%Y-%m-%dT%H:%M:%S") 10000 loops, best of 3: 21.8 us per loop </code></pre> The faster version: <pre class="prettyprint"><code>In [24]: p = re.compile('[-T:]') In [26]: %timeit datetime.datetime(*map(int, p.split("2013-01-01T01:23:45"))) 100000 loops, best of 3: 7.28 us per loop </code></pre> This is obviously nowhere near as flexible as <code>strptime()</code>. edit: Using a single regex to extract the date components is marginally faster: <pre class="prettyprint"><code>In [48]: pp = re.compile(r'(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})') In [49]: %timeit datetime.datetime(*map(int, pp.match("2013-01-01T01:23:45").groups())) 100000 loops, best of 3: 6.92 us per loop </code></pre>

Convert list of datestrings to datetime very slow with Python strptime

Tags:

python

numpy

I have data files containing lists of strings representing ISO formatted dates. Currently, I am reading them in using:

mydates = [ datetime.datetime.strptime(timdata[x], "%Y-%m-%dT%H:%M:%S") for x in range(len(timedata)) ]

This looks quite straightforward, but is ridiculously slow when operating on huge lists of ~25000 dates -> about 0.34 seconds per converted list. Since I have thousands of such lists I am looking for a faster way. However, I could not find one yet. The dateutil parser performs even worse...

653

asked Jan 04 '13 19:01

HyperCube

2 Answers

Here is a way to do it about 3x faster.

The original version:

In [23]: %timeit datetime.datetime.strptime("2013-01-01T01:23:45", "%Y-%m-%dT%H:%M:%S")
10000 loops, best of 3: 21.8 us per loop

The faster version:

In [24]: p = re.compile('[-T:]')

In [26]: %timeit datetime.datetime(*map(int, p.split("2013-01-01T01:23:45")))
100000 loops, best of 3: 7.28 us per loop

This is obviously nowhere near as flexible as strptime().

edit: Using a single regex to extract the date components is marginally faster:

In [48]: pp = re.compile(r'(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})')

In [49]: %timeit datetime.datetime(*map(int, pp.match("2013-01-01T01:23:45").groups()))
100000 loops, best of 3: 6.92 us per loop

198

answered Sep 16 '22 12:09

NPE

Indexing / slicing seems to be faster than the regex used by @NPE:

In [47]: def with_indexing(dstr):                              
   ....:     return datetime.datetime(*map(int, [dstr[:4], dstr[5:7], dstr[8:10],
   ....:                               dstr[11:13], dstr[14:16], dstr[17:]])) 

In [48]: p = re.compile('[-T:]')

In [49]: def with_regex(dt_str):
   ....:     return datetime.datetime(*map(int, p.split(dt_str)))

In [50]: %timeit with_regex(dstr)
100000 loops, best of 3: 3.84 us per loop

In [51]: %timeit with_indexing(dstr)
100000 loops, best of 3: 2.98 us per loop

I think if you would use a file parser like numpy.genfromtxt, the converters argument and a fast string parsing method you can read and parse a whole file in less than a half second.

I used the following function to create an example file with about 25000 rows, ISO date strings as index and 10 data columns:

import numpy as np
import pandas as pd

def create_data():
    # create dates
    dates = pd.date_range('2010-01-01T00:30', '2013-01-04T23:30', freq='H')
    # convert to iso
    iso_dates = dates.map(lambda x: x.strftime('%Y-%m-%dT%H:%M:%S'))
    # create data
    data = pd.DataFrame(np.random.random((iso_dates.size, 10)) * 100,
                        index=iso_dates)
    # write to file
    data.to_csv('dates.csv', header=False)

Than I used the following code to parse the file:

In [54]: %timeit a = np.genfromtxt('dates.csv', delimiter=',',
                                   converters={0:with_regex})
1 loops, best of 3: 430 ms per loop

In [55]: %timeit a = np.genfromtxt('dates.csv', delimiter=',',
                                   converters={0:with_indexing})
1 loops, best of 3: 391 ms per loop

pandas (based on numpy) has a C-based file parser which is even faster:

In [56]: %timeit df = pd.read_csv('dates.csv', header=None, index_col=0, 
                                  parse_dates=True, date_parser=with_indexing)
10 loops, best of 3: 167 ms per loop

answered Sep 20 '22 12:09

bmu

Related questions
                            
                                python string join performance
                            
                                Create function through MySQLdb
                            
                                In Python, how do you change an instantiated object after a reload?
                            
                                Best DataMining Database
                            
                                python [lxml] - cleaning out html tags
                            
                                get index of character in python list
                            
                                Random module not working. ValueError: empty range for randrange() (1,1, 0)
                            
                                Python - can I detect unicode string language code?
                            
                                How do I tell Python that sys.argv is in Unicode?
                            
                                Google App Engine Versioning in the Datastore
                            
                                Any way to speed up Python and Pygame?
                            
                                Bottle.py error routing
                            
                                Using anchors in python regex to get exact match
                            
                                How do you set the column width on a QTreeView?
                            
                                Accessing python dict using nested key lookup string
                            
                                SWIG and C++ shared library
                            
                                Python : How to insert a dictionary to a sqlite database?
                            
                                line-by-line file processing, for-loop vs with
                            
                                Read Specific Windows Event Log Event
                            
                                Create JSON Response in Django with Model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With