UPDATE: A fix for this bug has been committed and will debut in Python 3.10, expected to be released Oct 2021. See the bug report for details.
The documentation for time.perf_counter()
indicates that it is system-wide
time.perf_counter() → float
Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.
Am I incorrect in interpreting system-wide to include consistency across processes?
As shown below, it appears to be consistent on Linux, but not on Windows. In addition, Windows behavior with Python 3.6 is significantly different to 3.7.
I'd appreciate it if anyone can point out documentation or bug reports covering this behavior.
import concurrent.futures
import time
def worker():
return time.perf_counter()
if __name__ == '__main__':
pool = concurrent.futures.ProcessPoolExecutor()
futures = []
for i in range(3):
print('Submitting worker {:d} at time.perf_counter() == {:.3f}'.format(i, time.perf_counter()))
futures.append(pool.submit(worker))
time.sleep(1)
for i, f in enumerate(futures):
print('Worker {:d} started at time.perf_counter() == {:.3f}'.format(i, f.result()))
C:\...>Python36\python.exe -VV
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]
C:\...>Python36\python.exe perf_counter_across_processes.py
Submitting worker 0 at time.perf_counter() == 0.000
Submitting worker 1 at time.perf_counter() == 1.169
Submitting worker 2 at time.perf_counter() == 2.170
Worker 0 started at time.perf_counter() == 0.000
Worker 1 started at time.perf_counter() == 0.533
Worker 2 started at time.perf_counter() == 0.000
C:\...>Python37\python.exe -VV
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]
C:\...>Python37\python.exe perf_counter_across_processes.py
Submitting worker 0 at time.perf_counter() == 0.376
Submitting worker 1 at time.perf_counter() == 1.527
Submitting worker 2 at time.perf_counter() == 2.529
Worker 0 started at time.perf_counter() == 0.380
Worker 1 started at time.perf_counter() == 0.956
Worker 2 started at time.perf_counter() == 1.963
I've omitted further results on Windows for brevity, but the same behavior was observed on Windows 8.1. In addition, Python 3.6.7 behaved the same as 3.6.8, while Python 3.7.1 behaved the same as 3.7.3.
$ python3 -VV
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0]
$ python3 perf_counter_across_processes.py
Submitting worker 0 at time.perf_counter() == 2075.896
Submitting worker 1 at time.perf_counter() == 2076.900
Submitting worker 2 at time.perf_counter() == 2077.903
Worker 0 started at time.perf_counter() == 2075.900
Worker 1 started at time.perf_counter() == 2076.902
Worker 2 started at time.perf_counter() == 2077.905
$ python3.7 -VV
Python 3.7.1 (default, Oct 22 2018, 11:21:55)
[GCC 8.2.0]
$ python3.7 perf_counter_across_processes.py
Submitting worker 0 at time.perf_counter() == 1692.514
Submitting worker 1 at time.perf_counter() == 1693.518
Submitting worker 2 at time.perf_counter() == 1694.520
Worker 0 started at time.perf_counter() == 1692.517
Worker 1 started at time.perf_counter() == 1693.519
Worker 2 started at time.perf_counter() == 1694.522
The perf_counter() function always returns the float value of time in seconds. Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide.
time. process_time() has a tick rate that is about 4.67 times faster than time. perf_counter() , which has a tick rate that is about 33,491 times faster than time.
In Windows, time.perf_counter
is based on WINAPI QueryPerformanceCounter
. This counter is system wide. For more information, see acquiring high-resolution time stamps.
That said, perf_counter
in Windows returns a value that's relative to the process startup value. Thus it is not a system-wide value. It does this in order to reduce precision loss when converting the integer value to a float
, which has only 15 decimal digits of precision. Using a relative value is uncalled for in most cases, which only need microsecond precision. There should be an optional parameter to query the true QPC counter value, especially for perf_counter_ns
in 3.7+.
Regarding the different initial values returned by perf_counter
in 3.6 vs 3.7, the implementation has changed a bit over time. In 3.6.8, perf_counter
is implemented in Modules/timemodule.c, so the initial value is stored when the time
module is first imported and initialized, which is why you see the first result as 0.000 seconds. In more recent releases it's implemented separately in Python's C API. For example, see "Python/pytime.c" in the latest 3.8 beta release. In this case, by the time Python code calls time.perf_counter()
, the counter has incremented well past the startup value.
Here's an alternative implementation based on ctypes that uses the system-wide QPC value instead of a relative value.
import sys
if sys.platform != 'win32':
from time import perf_counter
try:
from time import perf_counter_ns
except ImportError:
def perf_counter_ns():
"""perf_counter_ns() -> int
Performance counter for benchmarking as nanoseconds.
"""
return int(perf_counter() * 10**9)
else:
import ctypes
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
kernel32.QueryPerformanceFrequency.argtypes = (
wintypes.PLARGE_INTEGER,) # lpFrequency
kernel32.QueryPerformanceCounter.argtypes = (
wintypes.PLARGE_INTEGER,) # lpPerformanceCount
_qpc_frequency = wintypes.LARGE_INTEGER()
if not kernel32.QueryPerformanceFrequency(ctypes.byref(_qpc_frequency)):
raise ctypes.WinError(ctypes.get_last_error())
_qpc_frequency = _qpc_frequency.value
def perf_counter_ns():
"""perf_counter_ns() -> int
Performance counter for benchmarking as nanoseconds.
"""
count = wintypes.LARGE_INTEGER()
if not kernel32.QueryPerformanceCounter(ctypes.byref(count)):
raise ctypes.WinError(ctypes.get_last_error())
return (count.value * 10**9) // _qpc_frequency
def perf_counter():
"""perf_counter() -> float
Performance counter for benchmarking.
"""
count = wintypes.LARGE_INTEGER()
if not kernel32.QueryPerformanceCounter(ctypes.byref(count)):
raise ctypes.WinError(ctypes.get_last_error())
return count.value / _qpc_frequency
QPC typically has a resolution of 0.1 microseconds. A float
in CPython has 15 decimal digits of precision. So this implementation of perf_counter
is within the QPC resolution for an uptime of about 3 years.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With