I'm currently creating a simple script that simulates a maths problem. The problem is 'The Frog Problem', presented here by Matt Parker of standupmaths on his YouTube channel. But basically, the problem is about a frog trying to hop from one side of a river to another on lillypads in increments. My code simulates this by subtracting a random number from the number of lillypads left and continuing until that number is 0.
This is the entire thing:
import random
import datetime
from multiprocessing import Pool
def frog_time(num_lillypads):
jumps = 0
while num_lillypads > 0:
num_lillypads -= random.randint(1, num_lillypads)
jumps += 1
return jumps
def frog_run(num_lillypads, iterations=10000):
ave = 0
print("Running {} lillypads.".format(num_lillypads))
for i in range(1, iterations+1):
ave = (ave*(i-1)+frog_time(num_lillypads))/i
return ave
def single_run(max_lillypads, iterations):
start = datetime.datetime.now()
results = []
for i in range(1, max_lillypads+1):
results.append(frog_run(i, iterations))
time_taken = datetime.datetime.now() - start
return time_taken
def timing_run(max_lillypads, iterations):
start = datetime.datetime.now()
with Pool() as pool:
pad_nos = list(range(1, max_lillypads+1))
results = pool.map(frog_run, range(1, max_lillypads+1))
time_taken = datetime.datetime.now() - start
return time_taken
def test(max=1000, iters=10000):
print("Concurrent run")
concurrent_time = timing_run(max, iters)
print("Single run")
single_time = single_run(max, iters)
print("Single run took {} to finish.".format(single_time))
print("Concurrent run took {} to finish.".format(concurrent_time))
I decided to use this as en exercise to practice concurrent programming in Python, but I expected wildly different results. When I run this I get:
Single run took 0:01:55.825933 to finish.
Concurrent run took 0:02:00.110245 to finish.
I thought that the run that implemented multiprocessing would be at least a little bit faster, if not significantly faster, but in this case it actually takes longer!
Can anybody who knows more about python multiprocessing help me out by explaining this result? Is the overhead of creating a new process for each one of these too much to make a difference, or maybe python.random is too slow, or is there something else wrong about this?
Right now, you aren't specifying an amount of processes to set up so it will default to maximum: [source]
processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.
Each worker process takes x amount of time to set up.
So, let's use some arbitrary values to see how we do:
- the function takes 120 seconds to run in one process
- each process takes 5 seconds to start
- each new process can divide the workload equally
If that were the case:
So, there is a point where you don't have gains by using multiprocessing, or you can limit the amount of processes to where you are still saving more time than the time lost creating the processes.
If you use pool(2) or pool(3), etc. you will probably see time gains and then losses again. At much larger scale, the more processes you have the better off you would be, but at small testing scale that may not be the case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With