My real task is to recursive traversal a remote directory using paramiko with multi-threading. For the sake of simplicity, I just use local filesystem to demonstrate it:
from pathlib import Path
from typing import List
from concurrent.futures import ThreadPoolExecutor, Executor
def listdir(root: Path, executor: Executor) -> List[Path]:
if root.is_dir():
xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
return sum(xss, [])
return [root]
with ThreadPoolExecutor(4) as e:
listdir(Path('.'), e)
However, the above code running without end.
What's wrong with my code? And how to fix it (better to use Executor
rather than the raw Thread
)?
EDIT: I have confirmed @Sraw 's answer by the following code:
In [4]: def listdir(root: Path, executor: Executor) -> List[Path]:
...: print(f'Enter {root}', flush=True)
...: if root.is_dir():
...: xss = executor.map(lambda d: listdir(d, executor), root.glob('*'))
...: return sum(xss, [])
...: return [root]
...:
In [5]: with ThreadPoolExecutor(4) as e:
...: listdir(Path('.'), e)
...:
Enter .
Enter NonRestrictedShares
Enter corporateActionData
Enter RiskModelAnnualEPS
Enter juyuan
There is a dead lock inside your code.
As you are using ThreadPoolExecutor(4)
, there are only four work threads in this executor, so you cannot run more than four tasks at the same time.
Image the following simplest structure:
test
----script.py
----test1
--------test2
------------test3
----------------test4
--------------------test5
If python script.py
, the first work thread handles test1
, the second one handles test1/test2
, the third one handles test1/test2/test3
, the fourth one handles test1/test2/test3/test4
. And now the work threads are exhausted. But there is another task test1/test2/test3/test4/test5
inserted into work queue.
So it will hang forever.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With