Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Execute multiple notebooks in parallel in pyspark databricks

Question is simple:

master_dim.py calls dim_1.py and dim_2.py to execute in parallel. Is this possible in databricks pyspark?

Below image is explaning what am trying to do, it errors for some reason, am i missing something here?

enter image description here

like image 422
Chandra Avatar asked Jun 09 '26 07:06

Chandra


1 Answers

Just for others in case they are after how it worked:

from multiprocessing.pool import ThreadPool
pool = ThreadPool(5)
notebooks = ['dim_1', 'dim_2']
pool.map(lambda path: dbutils.notebook.run("/Test/Threading/"+path, timeout_seconds= 60, arguments={"input-data": path}),notebooks)
like image 152
Chandra Avatar answered Jun 10 '26 22:06

Chandra



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!