Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dictionary searching with datetime keys

I have time series data that I am currently storing in a dictionary where the dictionary 'keys' are datetime.datetime objects. Something along the lines of:

data[datetime.datetime(2012,5,14,15,28,2)]={'error':error,'flags':flags,'value':value}

The question I have is: What is the best way to find the closest two times (before and after) a specified time? I need this function to be as fast a possible because it is called (~10,000) inside a loop that is linearly interpolating between the two closest points.


I currently have one method working which takes a ridiculously long time because it searches through all the keys (~50,000):

def findTime(time):
    keys=data.keys()
    bdt=10000000000000000000
    adt=10000000000000000000
    minKey=False
    maxKey=False
    for key in keys:
        dt=(time-key).total_seconds()
        if abs(dt)<bdt and dt>0:
            bdt=abs(dt)
            minKey=key
        elif abs(dt)<adt and dt<0:
            adt=abs(dt)
            maxKey=key
    return minKey,maxKey

My attempt at using bisect:

def findTime(time):
    keys=data.keys()
    l,r = bisect.bisect_left(time,keys), bisect.bisect_right(time,keys)
    return l,r

Unfortunately, this produces an error:

TypeError: 'datetime.datetime' object does not support indexing

Any help would be appreciated.

like image 384
Onlyjus Avatar asked Sep 03 '25 04:09

Onlyjus


2 Answers

The bisect functions take as their first argument a sorted array (or list, or really, anything that can be indexed). keys is an unsorted array, and you're passing it as the second argument.

This should work:

def findTime(time):
    keys = sorted(data.keys())
    return bisect.bisect_left(keys, time), bisect.bisect_right(keys, time)

although you should keep the sorted copy around for repeated searches that have not altered the data, rather than re-sorting every time.

like image 170
torek Avatar answered Sep 05 '25 19:09

torek


You are far better off using a different key for your dict.

Two are obvious.

1) You can use ISO 8601 date format as a string. This is essentially YYYY-MM-DD format. You can also use YYYY-MM-DD:HH:MM:SS format. A property of ISO 8601 is is lexical sorting, so in a sorted list of keys just take the two sorted keys above and below the insertion point.

2) You can use a float representation of the dates with the integer part being a day offset from a millennium mark and the float being the fraction of the day which is then easily converted to HH:MM:SS. Excel and Windows and Unix use this approach.

Example of 1):

>>> datetime.datetime.fromtimestamp(time.time()).isoformat()
'2012-05-14T13:55:22.142548'  # a hashable, sortable dict key based on time

Example of 2):

>>> time.time()               # That is days and fraction of day since 1/1/1970 
1337028447.499273             # THAT is you dict key
>>> datetime.datetime.fromtimestamp(time.time()).timetuple()
time.struct_time(tm_year=2012, tm_mon=5, tm_mday=14, tm_hour=13, tm_min=52, tm_sec=13, tm_wday=0, tm_yday=135, tm_isdst=-1)

In either case, Python would be able to manage a data structure of 50,000 elements in milliseconds.

Convert the time stamp to a datetime object as needed.

like image 26
dawg Avatar answered Sep 05 '25 18:09

dawg