Running it on Ubuntu 14 with Python 2.7.6
I simplified script to show my problem:
import time
import multiprocessing
data = range(1, 3)
start_time = time.clock()
def lol():
for i in data:
print time.clock() - start_time, "lol seconds"
def worker(n):
print time.clock() - start_time, "multiprocesor seconds"
def mp_handler():
p = multiprocessing.Pool(1)
p.map(worker, data)
if __name__ == '__main__':
lol()
mp_handler()
And the output:
8e-06 lol seconds
6.9e-05 lol seconds
-0.030019 multiprocesor seconds
-0.029907 multiprocesor seconds
Process finished with exit code 0
Using time.time()
gives non-negative values (as marked here Timer shows negative time elapsed)
but I'm curious what is the problem with time.clock()
in python multiprocessing
and reading time from CPU.
To add a concise Python 3 example to Bakuriu's excellent answer above you can use the following method to get a global timer independent of the subprocesses:
import multiprocessing as mp
import time
# create iterable
iterable = range(4)
# adds three to the given element
def add_3(num):
a = num + 3
return a
# multiprocessing attempt
def main():
pool = mp.Pool(2)
results = pool.map(add_3, iterable)
return results
if __name__ == "__main__": #Required not to spawn deviant children
start=time.time()
results = main()
print(list(results))
elapsed = (time.time() - start)
print("\n","time elapsed is :", elapsed)
Note that if we had instead used time.process_time()
instead of time.time()
we will get an undesired result.
multiprocessing
spawns new processes and time.clock()
on linux has the same meaning of the C's clock()
:
The value returned is the CPU time used so far as a clock_t;
So the values returned by clock
restart from 0
when a process start. However your code uses the parent's process start_time
to determine the time spent in the child process, which is obviously incorrect if the child processes CPU time resets.
The clock()
function makes sense only when handling one process, because its return value is the CPU time spent by that process. Child processes are not taken into account.
The time()
function on the other hand uses a system-wide clock, and thus can be used even between different processes (although it is not monotonic, so it might return wrong results if somebody changes the system time during the events).
Forking a running python instance is probably faster then starting a new one from scratch, hence start_time
is almost always bigger then the value returned by time.clock()
.
Take into account that the parent process also had to read your file on disk, perform the imports which may require reading other .py
files, searching directories etc.
The forked child processes don't have to do all that.
Example code that shows that the return value of time.clock()
resets to 0
:
from __future__ import print_function
import time
import multiprocessing
data = range(1, 3)
start_time = time.clock()
def lol():
for i in data:
t = time.clock()
print('t: ', t, end='\t')
print(t - start_time, "lol seconds")
def worker(n):
t = time.clock()
print('t: ', t, end='\t')
print(t - start_time, "multiprocesor seconds")
def mp_handler():
p = multiprocessing.Pool(1)
p.map(worker, data)
if __name__ == '__main__':
print('start_time', start_time)
lol()
mp_handler()
Result:
$python ./testing.py
start_time 0.020721
t: 0.020779 5.8e-05 lol seconds
t: 0.020804 8.3e-05 lol seconds
t: 0.001036 -0.019685 multiprocesor seconds
t: 0.001166 -0.019555 multiprocesor seconds
Note how t
is monotonic for the lol
case while goes back to 0.001
in the other case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With