I have time series data that I am currently storing in a dictionary where the dictionary 'keys' are datetime.datetime
objects. Something along the lines of:
data[datetime.datetime(2012,5,14,15,28,2)]={'error':error,'flags':flags,'value':value}
The question I have is: What is the best way to find the closest two times (before and after) a specified time? I need this function to be as fast a possible because it is called (~10,000) inside a loop that is linearly interpolating between the two closest points.
I currently have one method working which takes a ridiculously long time because it searches through all the keys (~50,000):
def findTime(time):
keys=data.keys()
bdt=10000000000000000000
adt=10000000000000000000
minKey=False
maxKey=False
for key in keys:
dt=(time-key).total_seconds()
if abs(dt)<bdt and dt>0:
bdt=abs(dt)
minKey=key
elif abs(dt)<adt and dt<0:
adt=abs(dt)
maxKey=key
return minKey,maxKey
My attempt at using bisect:
def findTime(time):
keys=data.keys()
l,r = bisect.bisect_left(time,keys), bisect.bisect_right(time,keys)
return l,r
Unfortunately, this produces an error:
TypeError: 'datetime.datetime' object does not support indexing
Any help would be appreciated.
The bisect
functions take as their first argument a sorted array (or list, or really, anything that can be indexed). keys
is an unsorted array, and you're passing it as the second argument.
This should work:
def findTime(time):
keys = sorted(data.keys())
return bisect.bisect_left(keys, time), bisect.bisect_right(keys, time)
although you should keep the sorted copy around for repeated searches that have not altered the data, rather than re-sorting every time.
You are far better off using a different key for your dict.
Two are obvious.
1) You can use ISO 8601 date format as a string. This is essentially YYYY-MM-DD
format. You can also use YYYY-MM-DD:HH:MM:SS
format. A property of ISO 8601 is is lexical sorting, so in a sorted list of keys just take the two sorted keys above and below the insertion point.
2) You can use a float representation of the dates with the integer part being a day offset from a millennium mark and the float being the fraction of the day which is then easily converted to HH:MM:SS. Excel and Windows and Unix use this approach.
Example of 1):
>>> datetime.datetime.fromtimestamp(time.time()).isoformat()
'2012-05-14T13:55:22.142548' # a hashable, sortable dict key based on time
Example of 2):
>>> time.time() # That is days and fraction of day since 1/1/1970
1337028447.499273 # THAT is you dict key
>>> datetime.datetime.fromtimestamp(time.time()).timetuple()
time.struct_time(tm_year=2012, tm_mon=5, tm_mday=14, tm_hour=13, tm_min=52, tm_sec=13, tm_wday=0, tm_yday=135, tm_isdst=-1)
In either case, Python would be able to manage a data structure of 50,000 elements in milliseconds.
Convert the time stamp to a datetime object as needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With