Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

np.unique blocks CPU with asyncio.to_thread

I have set up the following test program (Python 3.9.5, numpy 1.20.2):

import asyncio
from datetime import datetime
import numpy as np

async def calculate():
    print("=== unique")
    await asyncio.to_thread(lambda: np.unique(np.ones((2000, 50000)), axis=0))
    print("=== sort")
    await asyncio.to_thread(lambda: np.sort(np.ones((2000, 50000)), axis=0))
    print("=== cumsum")
    await asyncio.to_thread(lambda: np.cumsum(np.ones((2000, 100000)), axis=0))

async def ping():
    while True:
        print("async", datetime.utcnow())
        await asyncio.sleep(0.2)

async def main():
    p1 = asyncio.create_task(ping())
    c = asyncio.create_task(calculate())
    await asyncio.wait([p1, c], return_when=asyncio.FIRST_COMPLETED)
    p1.cancel()

asyncio.run(main())

The output is as follows:

async 2021-05-21 13:20:16.308948
=== unique
async 2021-05-21 13:20:16.531135
async 2021-05-21 13:20:40.142323
=== sort
async 2021-05-21 13:20:40.343306
async 2021-05-21 13:20:40.543658
async 2021-05-21 13:20:40.743989
async 2021-05-21 13:20:40.944312
async 2021-05-21 13:20:41.144664
async 2021-05-21 13:20:41.345007
=== cumsum
async 2021-05-21 13:20:41.545523
async 2021-05-21 13:20:41.745901
async 2021-05-21 13:20:41.946271
async 2021-05-21 13:20:42.146651
async 2021-05-21 13:20:42.347021
async 2021-05-21 13:20:42.547396

It is evident that np.unique takes ~23 seconds, and does not ever get interrupted the way it happens with np.cumsum and np.sort.

If my understanding of asyncio.to_thread and GIL is correct, anything that runs in a thread should be periodically interrupted to enable at least some degree of multitasking with threaded programs. This is supported by the behavior of np.sort and np.cumsum. What happens in np.unique that prevents that thread from being interrupted?

like image 706
Teyras Avatar asked Oct 23 '25 06:10

Teyras


1 Answers

this was a tricky one ;-)

The problem is that the GIL is not actually released in the np.unique call. The reason is the axis=0 parameter (you can verify that without it the call to np.unique releases GIL and is interleaved with the ping call).

TLDR; The semantics of axis argument is different for np.sort/cumsum and np.unique calls: while for np.sort/cumsum the operation is performed vectorized "in" that axis (i.e., sorting several arrays independently), the np.unique is performed on slices "along" that axis, and these slices are non-trivial data types, hence they require Python methods.

With the axis=0, what numpy does is that it "slices" the array in the first axis, creating a ndarray with shape (2000, 1), each element being an "n-tuple of values" (its dtype is an array of dtypes of the individual elements); this happens at https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/lib/arraysetops.py#L282-L294 .

Then a ndarray.sort method is called at https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/lib/arraysetops.py#L333. That in the end calls https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/core/src/multiarray/item_selection.c#L1236, which tries to release GIL at line https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/core/src/multiarray/item_selection.c#L979 , whose definition is at https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/core/include/numpy/ndarraytypes.h#L1004-L1006 -- so the GIL is released only if the type does not state NPY_NEEDS_PYAPI. However, given that the individual array elements are at this point nontrivial types, I assume they state NPY_NEEDS_PYAPI (I would expect for example comparisons to go through Python), and the GIL is not released.

Cheers.

like image 92
Milan Straka Avatar answered Oct 24 '25 21:10

Milan Straka