_multiprocessing.SemLock is not implemented when running on AWS Lambda

Tags:

python-multiprocessing

I have a short code that uses the multiprocessing package and works fine on my local machine.

When I uploaded to AWS Lambda and run there, I got the following error (stacktrace trimmed):

[Errno 38] Function not implemented: OSError Traceback (most recent call last):   File "/var/task/recorder.py", line 41, in record     pool = multiprocessing.Pool(10)   File "/usr/lib64/python2.7/multiprocessing/__init__.py", line 232, in Pool     return Pool(processes, initializer, initargs, maxtasksperchild)   File "/usr/lib64/python2.7/multiprocessing/pool.py", line 138, in __init__     self._setup_queues()   File "/usr/lib64/python2.7/multiprocessing/pool.py", line 234, in _setup_queues     self._inqueue = SimpleQueue()   File "/usr/lib64/python2.7/multiprocessing/queues.py", line 354, in __init__     self._rlock = Lock()   File "/usr/lib64/python2.7/multiprocessing/synchronize.py", line 147, in __init__     SemLock.__init__(self, SEMAPHORE, 1, 1)   File "/usr/lib64/python2.7/multiprocessing/synchronize.py", line 75, in __init__     sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue) OSError: [Errno 38] Function not implemented

Can it be that a part of python's core packages in not implemented? I have no idea what am I running on underneath so I can't login there and debug.

Any ideas how can I run multiprocessing on Lambda?

627

asked Nov 30 '15 18:11

2 Answers

As far as I can tell, multiprocessing won't work on AWS Lambda because the execution environment/container is missing /dev/shm - see https://forums.aws.amazon.com/thread.jspa?threadID=219962 (login may be required).

No word (that I can find) on if/when Amazon will change this. I also looked at other libraries e.g. https://pythonhosted.org/joblib/parallel.html will fallback to /tmp (which we know DOES exist) if it can't find /dev/shm, but that doesn't actually solve the problem.

172

answered Sep 22 '22 23:09

glennji

You can run routines in parallel on AWS Lambda using Python's multiprocessing module but you can't use Pools or Queues as noted in other answers. A workable solution is to use Process and Pipe as outlined in this article https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/

While the article definitely helped me get to a solution (shared below) there are a few things to be aware of. First, the Process and Pipe based solution is not as fast as the built in map function in Pool, though I did see nearly linear speed-ups as I increased the available memory/CPU resources in my Lambda function. Second there is a fair bit of management that has to be undertaken when developing multiprocessing functions in this way. I suspect this is at least partly why my solution is slower than the built in methods. If anyone has suggestions to speed it up I'd love to hear them! Finally, while the article notes that multiprocessing is useful for offloading asynchronous processes, there are other reasons to use multiprocessing such as lots of intensive math ops which is what I was trying to do. In the end I was happy enough with the performance improvement as it was much better than sequential execution!

The code:

# Python 3.6 from multiprocessing import Pipe, Process  def myWorkFunc(data, connection):     result = None      # Do some work and store it in result      if result:         connection.send([result])     else:         connection.send([None])   def myPipedMultiProcessFunc():      # Get number of available logical cores     plimit = multiprocessing.cpu_count()      # Setup management variables     results = []     parent_conns = []     processes = []     pcount = 0     pactive = []     i = 0      for data in iterable:         # Create the pipe for parent-child process communication         parent_conn, child_conn = Pipe()         # create the process, pass data to be operated on and connection         process = Process(target=myWorkFunc, args=(data, child_conn,))         parent_conns.append(parent_conn)         process.start()         pcount += 1          if pcount == plimit: # There is not currently room for another process             # Wait until there are results in the Pipes             finishedConns = multiprocessing.connection.wait(parent_conns)             # Collect the results and remove the connection as processing             # the connection again will lead to errors             for conn in finishedConns:                 results.append(conn.recv()[0])                 parent_conns.remove(conn)                 # Decrement pcount so we can add a new process                 pcount -= 1      # Ensure all remaining active processes have their results collected     for conn in parent_conns:         results.append(conn.recv()[0])         conn.close()      # Process results as needed

answered Sep 21 '22 23:09

E Morrow

Related questions
                            
                                Python multiprocessing module: join processes with timeout
                            
                                Running multiple tensorflow sessions concurrently
                            
                                How to unit-test code that uses python-multiprocessing
                            
                                How to efficiently run multiple Pytorch Processes / Models at once ? Traceback: The paging file is too small for this operation to complete
                            
                                In Kubernetes, do we still need multiprocess/gunicorn?
                            
                                Can I run multiprocessing Python programs on a single core machine?
                            
                                How do I name the processes in a multiprocessing.pool?
                            
                                How to run independent transformations in parallel using PySpark?
                            
                                Speed up reading multiple pickle files
                            
                                Python requests - threads/processes vs. IO
                            
                                How to efficiently use asyncio when calling a method on a BaseProxy?
                            
                                Python: TypeError: Pickling an AuthenticationString object is disallowed for security reasons
                            
                                Can I use map / imap / imap_unordered with functions with no arguments?
                            
                                celery: daemonic processes are not allowed to have children
                            
                                How can I abort a task in a multiprocessing.Pool after a timeout?
                            
                                Leveraging "Copy-on-Write" to Copy Data to Multiprocessing.Pool() Worker Processes
                            
                                How to profile multiple subprocesses using Python multiprocessing and memory_profiler?
                            
                                how to to terminate process using python's multiprocessing
                            
                                Python Shared Memory Dictionary for Mapping Big Data
                            
                                Dask: How would I parallelize my code with dask delayed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

_multiprocessing.SemLock is not implemented when running on AWS Lambda

Tags:

aws-lambda

python-multiprocessing

Zach Moshe

People also ask

2 Answers

glennji

E Morrow

Recent Activity

Donate For Us