Sometimes it takes a long time to run a single cell, while it is running, I would like to write and run other cells in the same notebook, accessing the variables in the same context. Is there any ipython magic that can be used such that when it is added to a cell, running the cell will automatically create a new thread and run with shared global data in the notebook?

Here is a little snippet that I came up with <pre class="prettyprint"><code>def jobs_manager(): from IPython.lib.backgroundjobs import BackgroundJobManager from IPython.core.magic import register_line_magic from IPython import get_ipython jobs = BackgroundJobManager() @register_line_magic def job(line): ip = get_ipython() jobs.new(line, ip.user_global_ns) return jobs </code></pre> It uses IPython builtin module <code>IPython.lib.backgroundjobs</code> . So code is small and simple and no new dependencies are introduced. I use it like this: <pre class="prettyprint"><code>jobs = jobs_manager() %job [fetch_url(_) for _ in urls] # saves html file to disk Starting job # 0 in a separate thread. </code></pre> Then you can monitor the state with: <pre class="prettyprint"><code>jobs.status() Running jobs: 1 : [fetch_url(_) for _ in urls] Dead jobs: 0 : [fetch_url(_) for _ in urls] </code></pre> If job fails you can inspect stack trace with <pre class="prettyprint"><code>jobs.traceback(0) </code></pre> There is no way to kill a job. So I carefully use this dirty hack: <pre class="prettyprint"><code>def kill_thread(thread): import ctypes id = thread.ident code = ctypes.pythonapi.PyThreadState_SetAsyncExc( ctypes.c_long(id), ctypes.py_object(SystemError) ) if code == 0: raise ValueError('invalid thread id') elif code != 1: ctypes.pythonapi.PyThreadState_SetAsyncExc( ctypes.c_long(id), ctypes.c_long(0) ) raise SystemError('PyThreadState_SetAsyncExc failed') </code></pre> It raises <code>SystemError</code> in a given thread. So to kill a job I do <pre class="prettyprint"><code>kill_thread(jobs.all[1]) </code></pre> To kill all running jobs I do <pre class="prettyprint"><code>for thread in jobs.running: kill_thread(thread) </code></pre> I like to use <code>%job</code> with widget-based progress bar https://github.com/alexanderkuk/log-progress like this: <pre class="prettyprint"><code>%job [fetch_url(_) for _ in log_progress(urls, every=1)] </code></pre> http://g.recordit.co/iZJsJm8BOL.gif One can even use <code>%job</code> instead of <code>multiprocessing.TreadPool</code>: <pre class="prettyprint"><code>for chunk in get_chunks(urls, 3): %job [fetch_url(_) for _ in log_progress(chunk, every=1)] </code></pre> http://g.recordit.co/oTVCwugZYk.gif Some obvious problems with this code: <ol> <li>You can not use arbitrary code in <code>%job</code>. There can be no assignments and not prints for example. So I use it with routines that store results on hard drive</li> <li>Sometimes dirty hack in <code>kill_thread</code> does not work. I think that is why <code>IPython.lib.backgroundjobs</code> does not have this functionality by design. If thread is doing some system call like <code>sleep</code> or <code>read</code> exception is ignored.</li> <li>It uses threads. Python has GIL , so <code>%job</code> can not be used for some heavy computations that take in python byte code </li> </ol>

a new thread for running a cell in ipython/jupyter notebook

2 Answers

It may not be an answer, but rather the direction to it. I did not saw anything like that, still I'm interested in this too.

My current findings suggesting that one need to define it's own custom cell magic. Good references would be the custom cell magic section in the documentation and two examples that I would consider:

memit: magic memory usage benching for IPython https://gist.github.com/vene/3022718
Illustrating Python multithreading vs multiprocessing: http://nathangrigg.net/2015/04/python-threading-vs-processes/

Both those links wrapping the code in a thread. That could be a starting point.

UPDATE: ngcm-tutorial at github has description of background jobs class

##github.com/jupyter/ngcm-tutorial/blob/master/Day-1/IPython%20Kernel/Background%20Jobs.ipynb from IPython.lib import backgroundjobs as bg jobs = bg.BackgroundJobManager()  def printfunc(interval=1, reps=5):     for n in range(reps):         time.sleep(interval)         print('In the background... %i' % n)         sys.stdout.flush()     print('All done!')     sys.stdout.flush()  jobs.new('printfunc(1,3)') jobs.status()

UPDATE 2: Another option:

from IPython.display import display from ipywidgets import IntProgress  import threading  class App(object):     def __init__(self, nloops=2000):         self.nloops = nloops         self.pb = IntProgress(description='Thread loops', min=0, max=self.nloops)      def start(self):         display(self.pb)         while self.pb.value < self.nloops:             self.pb.value += 1          self.pb.color = 'red'  app = App(nloops=20000)  t = threading.Thread(target=app.start)  t.start() #t.join()

176

answered Nov 09 '22 06:11

kpykc

Here is a little snippet that I came up with

def jobs_manager():     from IPython.lib.backgroundjobs import BackgroundJobManager     from IPython.core.magic import register_line_magic     from IPython import get_ipython      jobs = BackgroundJobManager()      @register_line_magic     def job(line):         ip = get_ipython()         jobs.new(line, ip.user_global_ns)      return jobs

It uses IPython builtin module IPython.lib.backgroundjobs . So code is small and simple and no new dependencies are introduced.

I use it like this:

jobs = jobs_manager()  %job [fetch_url(_) for _ in urls]  # saves html file to disk Starting job # 0 in a separate thread.

Then you can monitor the state with:

jobs.status()  Running jobs: 1 : [fetch_url(_) for _ in urls]  Dead jobs: 0 : [fetch_url(_) for _ in urls]

If job fails you can inspect stack trace with

jobs.traceback(0)

There is no way to kill a job. So I carefully use this dirty hack:

def kill_thread(thread):     import ctypes      id = thread.ident     code = ctypes.pythonapi.PyThreadState_SetAsyncExc(         ctypes.c_long(id),         ctypes.py_object(SystemError)     )     if code == 0:         raise ValueError('invalid thread id')     elif code != 1:         ctypes.pythonapi.PyThreadState_SetAsyncExc(             ctypes.c_long(id),             ctypes.c_long(0)         )         raise SystemError('PyThreadState_SetAsyncExc failed')

It raises SystemError in a given thread. So to kill a job I do

kill_thread(jobs.all[1])

To kill all running jobs I do

for thread in jobs.running:     kill_thread(thread)

I like to use %job with widget-based progress bar https://github.com/alexanderkuk/log-progress like this:

%job [fetch_url(_) for _ in log_progress(urls, every=1)]

http://g.recordit.co/iZJsJm8BOL.gif

One can even use %job instead of multiprocessing.TreadPool:

for chunk in get_chunks(urls, 3):     %job [fetch_url(_) for _ in log_progress(chunk, every=1)]

http://g.recordit.co/oTVCwugZYk.gif

Some obvious problems with this code:

You can not use arbitrary code in %job. There can be no assignments and not prints for example. So I use it with routines that store results on hard drive
Sometimes dirty hack in kill_thread does not work. I think that is why IPython.lib.backgroundjobs does not have this functionality by design. If thread is doing some system call like sleep or read exception is ignored.
It uses threads. Python has GIL , so %job can not be used for some heavy computations that take in python byte code

answered Nov 09 '22 08:11

alexanderkuk

Related questions
                            
                                ConcurrentBag<MyType> Vs List<MyType>
                            
                                How to read combobox from a thread other than the thread it was created on?
                            
                                Python: Something like `map` that works on threads [closed]
                            
                                What is PTHREAD_MUTEX_ADAPTIVE_NP
                            
                                Java EE specification and multi threading
                            
                                Best practices for Java logging from multiple threads?
                            
                                Are 64 bit assignments in Java atomic on a 32 bit machine?
                            
                                What are multi-threading DOs and DONTs? [closed]
                            
                                java.lang.RuntimeException: Only one Looper may be created per thread
                            
                                Why is this dispatch_sync() call freezing?
                            
                                ThreadPool max threads
                            
                                Android - Loading, please wait
                            
                                Can volatile variable be defined as static in java?
                            
                                Android Lollipop 5.0.1 SQLiteLog POSIX Error 11 SQLite Error: 3850
                            
                                Speed Up gganimate Rendering
                            
                                Fatal Python error and `BufferedWriter`
                            
                                Using BlockingCollection<T>: OperationCanceledException, is there a better way?
                            
                                How can I make external methods interruptable?
                            
                                How to find out which thread holds the monitor?
                            
                                How to implement a unmanaged thread-safe collection when I get this error: <mutex> is not supported when compiling with /clr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

a new thread for running a cell in ipython/jupyter notebook

Tags:

multithreading

ipython

jupyter

ipython-notebook

chentingpc

People also ask

2 Answers

kpykc

alexanderkuk

Recent Activity

Donate For Us