Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compare list of datetime to dict of datetime

I have a task to create sets of dates based on specific condition, for example "greater than 2" will be passed and I need to create a set of all dates in this month that have a day > 2. also Ill be getting a start time and a stop time for e.g. 10am-6pm in this case I will create a set of all the dates > 2 and in every day it has a time to start at 10am and ends and 6pm, below is an example:

greater > 2 less < 9 
start time :10am
stop time :6 pm
month:july
date1: 2016-07-03 10:00, 2016-07-03 16:00
date2: 2016-07-04 10:00, 2016-07-04 16:00
date3: 2016-07-05 10:00, 2016-07-05 16:00
.
.
.
date6: 2016-07-8 10:00, 2016-07-8 16:00

I decided to store these dates into a dictionary like the following:

dictD = {'dates_between_2_9':[[2016-07-03 10:00, 2016-07-03 16:00], [2016-07-04 10:00, 2016-07-04 16:00], ....., [2016-07-08 10:00, 2016-07-08 16:00]]} 

I used the dict because I will have multiple conditions that I need to create sets of dates for them, so there will be for example another key other than dates_between_2_5.

at the other hand I get another request based on a condition too to create dates with start time only like the following:

greater > 1 less than 12
start time : 2pm
    date1: 2016-07-02 14:00
    date2: 2016-07-03 14:00
    date3: 2016-07-04 14:00
    .
    .
    .
    date10: 2016-07-11 14:00

I decided to store these dates in a list:

listL = [2016-07-02 14:00,2016-07-03 14:00,2016-07-04 14:00 ... 2016-07-11 14:00]

after that I compare each date from ListL to the list of dates for each key from DictD and if a date from ListL lies within a start,stop time then I should remove it from the list and return only the dates from ListL that don't overlap with dates from DictD, my logic is like the following:

for L from ListL:
    for every key in DictD:
        for item from DictD[key]:
            if DictD[key][0] < L < DictD[key][1] # check if item from list overlap with start,stop time from dictionary.
                ListL.remove(L) # I know I can't remove items from list while iterating so I will probably create a set and store all overlapped items and then subtract this set to set(ListL) to get the difference. 
return ListL

My question is, am I using an efficient data structures to handle my requirements? I see my logic is not that efficient so I was wondering if there is a better way for approaching this problem?

any help would be greatly appreciated. thanks in advance!

like image 371
tkyass Avatar asked Jul 01 '16 20:07

tkyass


2 Answers

It sounds like you're trying to optimize your algorithm. To be honest, with data this size, it's probably not necessary. However, if you are interested, the general rule of thumb is that sets are faster than lists in Python when checking for membership.

In this case, it's not clear what your sets might be. I've assumed that you have at most a minute-level of granularity, but you could go lower (for more memory) or indeed improve occupancy and performance by going for a larger granularity - e.g. hours. This code shows even relatively large sets can be at least 5x faster (and look a little simpler when comparing your data sets):

from copy import copy
from datetime import datetime, timedelta
from timeit import timeit
import time

def make_range(start, open, close, days):
    result = []
    base_start = start + open
    base_close = start + close
    while days > 0:
        result.append([base_start, base_close])
        base_start += timedelta(days=1)
        base_close += timedelta(days=1)
        days -= 1
    return result

def make_range2(start, open, close, days):
    result = set()
    base_start = start + open
    base_close = start + close
    while days > 0:
        now = base_start
        while now <= base_close:
            result.add(now)
            now += timedelta(minutes=1)
        base_start += timedelta(days=1)
        base_close += timedelta(days=1)
        days -= 1
    return result

dateRange = {
    'range1': make_range(datetime(2016, 7, 3, 0, 0),
                         timedelta(hours=10),
                         timedelta(hours=18),
                         6),
}

dateRange2 = {
    'range1': make_range2(datetime(2016, 7, 3, 0, 0),
                          timedelta(hours=10),
                          timedelta(hours=18),
                          6),
}

dateList = [
    datetime(2016, 7, 2, 14, 0),
    datetime(2016, 7, 3, 14, 0),
    datetime(2016, 7, 4, 14, 0),
    datetime(2016, 7, 5, 14, 0),
    datetime(2016, 7, 6, 14, 0),
    datetime(2016, 7, 7, 14, 0),
    datetime(2016, 7, 8, 14, 0),
    datetime(2016, 7, 9, 14, 0),
    datetime(2016, 7, 10, 14, 0),
    datetime(2016, 7, 11, 14, 0)
]

dateSet = set(dateList)

def f1():
    result = copy(dateList)
    for a in dateList:
        for b in dateRange:
            for i in dateRange[b]:
                if i[0] <= a <= i[1]:
                    result.remove(a)
    return result

def f2():
    result = copy(dateSet)
    for b in dateRange2:
        result = result.difference(dateRange2[b])
    return result

print(f1())
print(timeit("f1()", "from __main__ import f1", number=100000))

print(f2())
print(timeit("f2()", "from __main__ import f2", number=100000))

For the record, the results are as follows:

[datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)]
1.922587754837455

{datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)}
0.30558400587733225

You could also convert the dict dateRange to a list, but with just 1 or 2 members, this is unlikely to make any real difference in performance. However, it makes more logical sense, as you are not actually using the dict to look up any specific key values - you are just iterating through all the values.

like image 62
Peter Brittain Avatar answered Oct 15 '22 05:10

Peter Brittain


Frankly speaking I am not sure if I understand what is your problem, I tried something like this:

for date in dateList:
    for everyrange in dateRange:
        find=False
        for i in dateRange[everyrange]:
            #print('date={date} ,key={everyrange},i={i}'.format(date=date, everyrange=everyrange,i=i))
            if i[0] <= date <= i[1]:
                print(date)
                find=True
                break
            else:
                print(0)
        if find:
            break
like image 42
sebnorth Avatar answered Oct 15 '22 05:10

sebnorth