I have a Celery task called simple_theano_tasks:
@app.task(bind=True, queue='test')
def simple_theano_tasks(self):
import theano, numpy as np
my_array = np.zeros((0,), dtype=theano.config.floatX)
shared = theano.shared(my_array, name='my_variable', borrow=True)
print 'Done. Shared value is {}'.format(shared.get_value())
When THEANO is configured to use the CPU, everything works as expected (no error):
$ THEANO_FLAGS=device=cpu celery -A my_project worker -c1 -l info -Q test
[INFO/MainProcess] Received task: my_project.tasks.simple_theano_tasks[xxxx]
[WARNING/Worker-1] Done. Shared value is []
[INFO/MainProcess] Task my_project.tasks.simple_theano_tasks[xxxx] succeeded in 0.00407959899985s
Now, when I do the exact same thing with GPU enabled, Theano (or CUDA) raise an error:
$ THEANO_FLAGS=device=gpu celery -A my_project worker -c1 -l info -Q test
...
Using gpu device 0: GeForce GTX 670M (CNMeM is enabled)
...
[INFO/MainProcess] Received task: my_project.tasks.simple_theano_tasks[xxx]
[ERROR/MainProcess] Task my_project.tasks.simple_theano_tasks[xxx] raised unexpected: RuntimeError("Cuda error 'initialization error' while copying %lli data element to device memory",)
Traceback (most recent call last):
File "/.../local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/.../local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/.../my_project/tasks.py", line 362, in simple_theano_tasks
shared = theano.shared(my_array, name='my_variable', borrow=True)
File "/.../local/lib/python2.7/site-packages/theano/compile/sharedvalue.py", line 247, in shared
allow_downcast=allow_downcast, **kwargs)
File "/.../local/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 229, in float32_shared_constructor
deviceval = type_support_filter(value, type.broadcastable, False, None)
RuntimeError: Cuda error 'initialization error' while copying %lli data element to device memory
Finally, when I run the exact same code in a Python shell I have no error:
$ THEANO_FLAGS=device=gpu python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano, numpy as np
Using gpu device 0: GeForce GTX 670M (CNMeM is enabled)
>>> my_array = np.zeros((0,), dtype=theano.config.floatX)
>>> shared = theano.shared(my_array, name='my_variable', borrow=True)
>>> print 'Done. Shared value is {}'.format(shared.get_value())
Done. Shared value is []
Does anyone has an idea of:
Some additional context:
I am using [email protected] and [email protected]
"~/.theanorc" file
[global]
floatX=float32
device=gpu
[mode]=FAST_RUN
[nvcc]
fastmath=True
[lib]
cnmem=0.1
[cuda]
root=/usr/local/cuda
A workaround is to:
Celery task is now:
@app.task(bind=True, queue='test')
def simple_theano_tasks(self):
# At this point, no theano import statements have been processed, and so the device is unbound
import theano, numpy as np
import theano.sandbox.cuda
theano.sandbox.cuda.use('gpu') # enable gpu
my_array = np.zeros((0,), dtype=theano.config.floatX)
shared = theano.shared(my_array, name='my_variable', borrow=True)
print 'Done. Shared value is {}'.format(shared.get_value())
Note: I found the solution reading this article about using multiple GPU
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With