I have a task to create sets of dates based on specific condition, for example "greater than 2" will be passed and I need to create a set of all dates in this month that have a day > 2. also Ill be getting a start time and a stop time for e.g. 10am-6pm in this case I will create a set of all the dates > 2 and in every day it has a time to start at 10am and ends and 6pm, below is an example:
greater > 2 less < 9
start time :10am
stop time :6 pm
month:july
date1: 2016-07-03 10:00, 2016-07-03 16:00
date2: 2016-07-04 10:00, 2016-07-04 16:00
date3: 2016-07-05 10:00, 2016-07-05 16:00
.
.
.
date6: 2016-07-8 10:00, 2016-07-8 16:00
I decided to store these dates into a dictionary like the following:
dictD = {'dates_between_2_9':[[2016-07-03 10:00, 2016-07-03 16:00], [2016-07-04 10:00, 2016-07-04 16:00], ....., [2016-07-08 10:00, 2016-07-08 16:00]]}
I used the dict because I will have multiple conditions that I need to create sets of dates for them, so there will be for example another key other than dates_between_2_5.
at the other hand I get another request based on a condition too to create dates with start time only like the following:
greater > 1 less than 12
start time : 2pm
date1: 2016-07-02 14:00
date2: 2016-07-03 14:00
date3: 2016-07-04 14:00
.
.
.
date10: 2016-07-11 14:00
I decided to store these dates in a list:
listL = [2016-07-02 14:00,2016-07-03 14:00,2016-07-04 14:00 ... 2016-07-11 14:00]
after that I compare each date from ListL to the list of dates for each key from DictD and if a date from ListL lies within a start,stop time then I should remove it from the list and return only the dates from ListL that don't overlap with dates from DictD, my logic is like the following:
for L from ListL:
for every key in DictD:
for item from DictD[key]:
if DictD[key][0] < L < DictD[key][1] # check if item from list overlap with start,stop time from dictionary.
ListL.remove(L) # I know I can't remove items from list while iterating so I will probably create a set and store all overlapped items and then subtract this set to set(ListL) to get the difference.
return ListL
My question is, am I using an efficient data structures to handle my requirements? I see my logic is not that efficient so I was wondering if there is a better way for approaching this problem?
any help would be greatly appreciated. thanks in advance!
It sounds like you're trying to optimize your algorithm. To be honest, with data this size, it's probably not necessary. However, if you are interested, the general rule of thumb is that sets are faster than lists in Python when checking for membership.
In this case, it's not clear what your sets might be. I've assumed that you have at most a minute-level of granularity, but you could go lower (for more memory) or indeed improve occupancy and performance by going for a larger granularity - e.g. hours. This code shows even relatively large sets can be at least 5x faster (and look a little simpler when comparing your data sets):
from copy import copy
from datetime import datetime, timedelta
from timeit import timeit
import time
def make_range(start, open, close, days):
result = []
base_start = start + open
base_close = start + close
while days > 0:
result.append([base_start, base_close])
base_start += timedelta(days=1)
base_close += timedelta(days=1)
days -= 1
return result
def make_range2(start, open, close, days):
result = set()
base_start = start + open
base_close = start + close
while days > 0:
now = base_start
while now <= base_close:
result.add(now)
now += timedelta(minutes=1)
base_start += timedelta(days=1)
base_close += timedelta(days=1)
days -= 1
return result
dateRange = {
'range1': make_range(datetime(2016, 7, 3, 0, 0),
timedelta(hours=10),
timedelta(hours=18),
6),
}
dateRange2 = {
'range1': make_range2(datetime(2016, 7, 3, 0, 0),
timedelta(hours=10),
timedelta(hours=18),
6),
}
dateList = [
datetime(2016, 7, 2, 14, 0),
datetime(2016, 7, 3, 14, 0),
datetime(2016, 7, 4, 14, 0),
datetime(2016, 7, 5, 14, 0),
datetime(2016, 7, 6, 14, 0),
datetime(2016, 7, 7, 14, 0),
datetime(2016, 7, 8, 14, 0),
datetime(2016, 7, 9, 14, 0),
datetime(2016, 7, 10, 14, 0),
datetime(2016, 7, 11, 14, 0)
]
dateSet = set(dateList)
def f1():
result = copy(dateList)
for a in dateList:
for b in dateRange:
for i in dateRange[b]:
if i[0] <= a <= i[1]:
result.remove(a)
return result
def f2():
result = copy(dateSet)
for b in dateRange2:
result = result.difference(dateRange2[b])
return result
print(f1())
print(timeit("f1()", "from __main__ import f1", number=100000))
print(f2())
print(timeit("f2()", "from __main__ import f2", number=100000))
For the record, the results are as follows:
[datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)]
1.922587754837455
{datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)}
0.30558400587733225
You could also convert the dict dateRange to a list, but with just 1 or 2 members, this is unlikely to make any real difference in performance. However, it makes more logical sense, as you are not actually using the dict to look up any specific key values - you are just iterating through all the values.
Frankly speaking I am not sure if I understand what is your problem, I tried something like this:
for date in dateList:
for everyrange in dateRange:
find=False
for i in dateRange[everyrange]:
#print('date={date} ,key={everyrange},i={i}'.format(date=date, everyrange=everyrange,i=i))
if i[0] <= date <= i[1]:
print(date)
find=True
break
else:
print(0)
if find:
break
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With