I'm making a Monte-Carlo simulation. And as a part of this task I generate samples uniformly distributed over an interval (0,100)
.
generate = lambda: uniform(0,100)
The iterations stop when all the closest generated points' pairs meet the criteria.
check = lambda a,b: True if (b-a)<5 else False
I need to have some structure to effectively keep all the generated points and be able to go through them in ascending order to perform check
on all the subsequent pairs.
There is a heapq
module in Python which supports a very effective heap structure. And I decided to use it.
I faced a problem. I have found no traversal procedure supported by this module. The only way I found to access the values of the heap in ascending order is to use heapq.heappop
. But it deletes the values from the heap.
I found the workaround for this and just copied the heap object into the new one and iterated with heappop
over the new one. But I don't think it's quite effective to copy the whole structure in memory one every iteration.
Is there any other way I can go to do what I'm trying to do more effectively?
The simplified code for illustration.
import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy
def pairwise(iterable): #get values from iterator in pairs
a, b = tee(iterable)
next(b, None)
return izip(a, b)
check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)
def iterate_heap(heap):
heap = copy(heap) #Here I have to copy the heap to be able to traverse
try:
while True:
yield heapq.heappop(heap)
except IndexError:
return
def trial():
items = []
for i in count():
item = generate()
heapq.heappush(items, item)
it = iterate_heap(items)
it = pairwise(it)
if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
return i
print "The solution is reached. It took %d iterations." % trial()
paiwise
function is from recipe from here.
Update:
In this implementation with heappop
the complexity on each iteration is O(n*log(n))
:
Copying heap: O(n)
Adding a new value to the heap: O(log(n))
Traversing: n
elements * O(log(n))
on popping each value from heap -> O(n*log(n))
.
Result: O(n+log(n)+n*log(n)) = O(n*log(n)
But I expect the traversal to be O(n)
, so the resultant complexity would be O(n)
.
By the way, if we use just sorted list, we would need to sort the list on each adding, so O(n*log(n))
, but the traversal would be n*O(1) -> O(n)
. So, the resultant complexity is still O(n*log(n))
.
I have found a solution. It's to use bisect
module. Finding the place to add would be O(log(n))
. Adding to the list is of O(n)
(because of the implementation all the values after the insertion in place have to be moved). Traversing is O(n)
. So, the resultant complexity is O(n)
.
Still, I wounder, if there is a way to solve this task using heaps in Python.
I would use list.sort() on the heap. That leaves the heap condition intact and makes it possible to iterate over the underlying list directly.
FWIW, the Timsort algorithm used by list.sort will take advantage of the partial ordering that already exists in the heap.
From the python docs:
These two make it possible to view the heap as a regular Python list without surprises: heap[0] is the smallest item, and heap.sort() maintains the heap invariant!
Is there a reason you can't just treat the heap as a list and iterate over it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With