Should time.perf_counter() be consistent across processes in Python on Windows?

Tags:

UPDATE: A fix for this bug has been committed and will debut in Python 3.10, expected to be released Oct 2021. See the bug report for details.

The documentation for time.perf_counter() indicates that it is system-wide

time.perf_counter() → float

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.

Am I incorrect in interpreting system-wide to include consistency across processes?

As shown below, it appears to be consistent on Linux, but not on Windows. In addition, Windows behavior with Python 3.6 is significantly different to 3.7.

I'd appreciate it if anyone can point out documentation or bug reports covering this behavior.

Test case

import concurrent.futures
import time

def worker():
    return time.perf_counter()

if __name__ == '__main__':
    pool = concurrent.futures.ProcessPoolExecutor()
    futures = []
    for i in range(3):
        print('Submitting worker {:d} at time.perf_counter() == {:.3f}'.format(i, time.perf_counter()))
        futures.append(pool.submit(worker))
        time.sleep(1)

    for i, f in enumerate(futures):
        print('Worker {:d} started at time.perf_counter() == {:.3f}'.format(i, f.result()))

Results on Windows 7

C:\...>Python36\python.exe -VV
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]

C:\...>Python36\python.exe perf_counter_across_processes.py
Submitting worker 0 at time.perf_counter() == 0.000
Submitting worker 1 at time.perf_counter() == 1.169
Submitting worker 2 at time.perf_counter() == 2.170
Worker 0 started at time.perf_counter() == 0.000
Worker 1 started at time.perf_counter() == 0.533
Worker 2 started at time.perf_counter() == 0.000

C:\...>Python37\python.exe -VV
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]

C:\...>Python37\python.exe perf_counter_across_processes.py
Submitting worker 0 at time.perf_counter() == 0.376
Submitting worker 1 at time.perf_counter() == 1.527
Submitting worker 2 at time.perf_counter() == 2.529
Worker 0 started at time.perf_counter() == 0.380
Worker 1 started at time.perf_counter() == 0.956
Worker 2 started at time.perf_counter() == 1.963

I've omitted further results on Windows for brevity, but the same behavior was observed on Windows 8.1. In addition, Python 3.6.7 behaved the same as 3.6.8, while Python 3.7.1 behaved the same as 3.7.3.

Results on Ubuntu 18.04.1 LTS

$ python3 -VV
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0]

$ python3 perf_counter_across_processes.py 
Submitting worker 0 at time.perf_counter() == 2075.896
Submitting worker 1 at time.perf_counter() == 2076.900
Submitting worker 2 at time.perf_counter() == 2077.903
Worker 0 started at time.perf_counter() == 2075.900
Worker 1 started at time.perf_counter() == 2076.902
Worker 2 started at time.perf_counter() == 2077.905

$ python3.7 -VV
Python 3.7.1 (default, Oct 22 2018, 11:21:55) 
[GCC 8.2.0]

$ python3.7 perf_counter_across_processes.py 
Submitting worker 0 at time.perf_counter() == 1692.514
Submitting worker 1 at time.perf_counter() == 1693.518
Submitting worker 2 at time.perf_counter() == 1694.520
Worker 0 started at time.perf_counter() == 1692.517
Worker 1 started at time.perf_counter() == 1693.519
Worker 2 started at time.perf_counter() == 1694.522

830

asked Jun 07 '19 23:06

echo

1 Answers

In Windows, time.perf_counter is based on WINAPI QueryPerformanceCounter. This counter is system wide. For more information, see acquiring high-resolution time stamps.

That said, perf_counter in Windows returns a value that's relative to the process startup value. Thus it is not a system-wide value. It does this in order to reduce precision loss when converting the integer value to a float, which has only 15 decimal digits of precision. Using a relative value is uncalled for in most cases, which only need microsecond precision. There should be an optional parameter to query the true QPC counter value, especially for perf_counter_ns in 3.7+.

Regarding the different initial values returned by perf_counter in 3.6 vs 3.7, the implementation has changed a bit over time. In 3.6.8, perf_counter is implemented in Modules/timemodule.c, so the initial value is stored when the time module is first imported and initialized, which is why you see the first result as 0.000 seconds. In more recent releases it's implemented separately in Python's C API. For example, see "Python/pytime.c" in the latest 3.8 beta release. In this case, by the time Python code calls time.perf_counter(), the counter has incremented well past the startup value.

Here's an alternative implementation based on ctypes that uses the system-wide QPC value instead of a relative value.

import sys

if sys.platform != 'win32':
    from time import perf_counter
    try:
        from time import perf_counter_ns
    except ImportError:
        def perf_counter_ns():
            """perf_counter_ns() -> int

            Performance counter for benchmarking as nanoseconds.
            """
            return int(perf_counter() * 10**9)
else:
    import ctypes
    from ctypes import wintypes

    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

    kernel32.QueryPerformanceFrequency.argtypes = (
        wintypes.PLARGE_INTEGER,) # lpFrequency

    kernel32.QueryPerformanceCounter.argtypes = (
        wintypes.PLARGE_INTEGER,) # lpPerformanceCount

    _qpc_frequency = wintypes.LARGE_INTEGER()
    if not kernel32.QueryPerformanceFrequency(ctypes.byref(_qpc_frequency)):
        raise ctypes.WinError(ctypes.get_last_error())
    _qpc_frequency = _qpc_frequency.value

    def perf_counter_ns():
        """perf_counter_ns() -> int

        Performance counter for benchmarking as nanoseconds.
        """
        count = wintypes.LARGE_INTEGER()
        if not kernel32.QueryPerformanceCounter(ctypes.byref(count)):
            raise ctypes.WinError(ctypes.get_last_error())
        return (count.value * 10**9) // _qpc_frequency

    def perf_counter():
        """perf_counter() -> float

        Performance counter for benchmarking.
        """
        count = wintypes.LARGE_INTEGER()
        if not kernel32.QueryPerformanceCounter(ctypes.byref(count)):
            raise ctypes.WinError(ctypes.get_last_error())
        return count.value / _qpc_frequency

QPC typically has a resolution of 0.1 microseconds. A float in CPython has 15 decimal digits of precision. So this implementation of perf_counter is within the QPC resolution for an uptime of about 3 years.

181

answered Sep 21 '22 01:09

Eryk Sun

Related questions
                            
                                How to make predictions with tf.estimator.Estimator from checkpoint?
                            
                                Modifying class __dict__ when shadowed by a property
                            
                                How do you recursively get all submodules in a python package?
                            
                                Slice pandas dataframe json column into columns
                            
                                Save jaw only as image with dlib facial landmark detection and the rest to be transparent
                            
                                PyCharm + Python 3.6 + Django + debugging + generators == world of pain
                            
                                Importing data from an excel file using python into SQL Server
                            
                                Per server prefixs
                            
                                Why is the order of Python sets not deterministic even when PYTHONHASHSEED=0?
                            
                                error :object can't be deleted because its id attribute is set to None
                            
                                Unable to run Tracking on Open CV 3.4.1 on Python 3.6.6
                            
                                Efficiently resize batch of np.array images
                            
                                Accessing super(parent) class variable in python
                            
                                asynchronous python itertools chain multiple generators
                            
                                AttributeError: module 'concurrent' has no attribute 'futures' when I try parallel processing in python 3.6
                            
                                How to Fix: "ImportError: DLL load failed The specified procedure could not be found." when the DLLs are there
                            
                                How to limit the number of float digits JSONEncoder produces?
                            
                                How to pass multiple arguments in pytest using command line?
                            
                                How to evaluate a variable as a Python f-string
                            
                                Scikit-learn pipeline TypeError: zip argument #2 must support iteration

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should time.perf_counter() be consistent across processes in Python on Windows?

Tags:

time

python-3.x

windows

multiprocessing