Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to recursive traversal directory using ThreadPoolExecutor?

My real task is to recursive traversal a remote directory using paramiko with multi-threading. For the sake of simplicity, I just use local filesystem to demonstrate it:

from pathlib import Path
from typing import List
from concurrent.futures import ThreadPoolExecutor, Executor

def listdir(root: Path, executor: Executor) -> List[Path]:
    if root.is_dir():
        xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
        return sum(xss, [])
    return [root]

with ThreadPoolExecutor(4) as e:
    listdir(Path('.'), e)

However, the above code running without end.

What's wrong with my code? And how to fix it (better to use Executor rather than the raw Thread)?

EDIT: I have confirmed @Sraw 's answer by the following code:

In [4]: def listdir(root: Path, executor: Executor) -> List[Path]:
   ...:     print(f'Enter {root}', flush=True)
   ...:     if root.is_dir():
   ...:         xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
   ...:         return sum(xss, [])
   ...:     return [root]
   ...:

In [5]: with ThreadPoolExecutor(4) as e:
   ...:     listdir(Path('.'), e)
   ...:
Enter .
Enter NonRestrictedShares
Enter corporateActionData
Enter RiskModelAnnualEPS
Enter juyuan
like image 214
Eastsun Avatar asked May 18 '18 01:05

Eastsun


1 Answers

There is a dead lock inside your code.

As you are using ThreadPoolExecutor(4), there are only four work threads in this executor, so you cannot run more than four tasks at the same time.

Image the following simplest structure:

test
----script.py
----test1
--------test2
------------test3
----------------test4
--------------------test5

If python script.py, the first work thread handles test1, the second one handles test1/test2, the third one handles test1/test2/test3, the fourth one handles test1/test2/test3/test4. And now the work threads are exhausted. But there is another task test1/test2/test3/test4/test5 inserted into work queue.

So it will hang forever.

like image 197
Sraw Avatar answered Nov 10 '22 02:11

Sraw