My real task is to recursive traversal a remote directory using paramiko with multi-threading. For the sake of simplicity, I just use local filesystem to demonstrate it:
from pathlib import Path
from typing import List
from concurrent.futures import ThreadPoolExecutor, Executor
def listdir(root: Path, executor: Executor) -> List[Path]:
if root.is_dir():
xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
return sum(xss, [])
return [root]
with ThreadPoolExecutor(4) as e:
listdir(Path('.'), e)
However, the above code running without end.
What's wrong with my code? And how to fix it (better to use Executor rather than the raw Thread)?
EDIT: I have confirmed @Sraw 's answer by the following code:
In [4]: def listdir(root: Path, executor: Executor) -> List[Path]:
...: print(f'Enter {root}', flush=True)
...: if root.is_dir():
...: xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
...: return sum(xss, [])
...: return [root]
...:
In [5]: with ThreadPoolExecutor(4) as e:
...: listdir(Path('.'), e)
...:
Enter .
Enter NonRestrictedShares
Enter corporateActionData
Enter RiskModelAnnualEPS
Enter juyuan
There is a dead lock inside your code.
As you are using ThreadPoolExecutor(4), there are only four work threads in this executor, so you cannot run more than four tasks at the same time.
Image the following simplest structure:
test
----script.py
----test1
--------test2
------------test3
----------------test4
--------------------test5
If python script.py, the first work thread handles test1, the second one handles test1/test2, the third one handles test1/test2/test3, the fourth one handles test1/test2/test3/test4. And now the work threads are exhausted. But there is another task test1/test2/test3/test4/test5 inserted into work queue.
So it will hang forever.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With