Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Channels Redis: Exception inside application: Lock is not acquired

Fully loaded multi-tenant Django application with 1000's of WebSockets using Daphne/Channels, running fine for a few months and suddenly tenants all calling it the support line the application running slow or outright hanging. Narrowed it down to WebSockets as HTTP REST API hits came through fast and error free.

None of the application logs or OS logs indicate some issue, so only thing to go on is the exception noted below. It happened over and over again here and there throughout 2 days.

I don't expect any deep debugging help, just some off-the-cuff advice on possibilities.

AWS Linux 1
Python 3.6.4
Elasticache Redis 5.0
channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0
Django==2.2.13

Split configuration HTTP served by uwsgi, daphne serves asgi, Nginx

May 10 08:08:16 prod-b-web1: [pid 15053] [version 119.5.10.5086] [tenant_id -] [domain_name -] [pathname /opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/daphne/server.py] [lineno 288] [priority ERROR] [funcname application_checker] [request_path -] [request_method -] [request_data -] [request_user -] [request_stack -] Exception inside application: Lock is not acquired.
Traceback (most recent call last):
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 435, in receive
    real_channel
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 484, in receive_single
    await self.receive_clean_locks.acquire(channel_key)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 152, in acquire
    return await self.locks[channel].acquire()
  File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 176, in acquire
    yield from fut
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
    return await self.inner(receive, self.send)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/middleware.py", line 41, in coroutine_call
    await inner_instance(receive, send)
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
    [receive, self.channel_receive], self.dispatch
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/utils.py", line 58, in await_many_dispatch
    await task
  File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 447, in receive
    self.receive_lock.release()
  File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 201, in release
    raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
like image 332
LiteWait Avatar asked May 15 '21 22:05

LiteWait


Video Answer


1 Answers

First, lets have a look at the source of the RuntimeError: Lock is not acquired. error. As given by the traceback, the release() method in the file /opt/python3.6/lib/python3.6/asyncio/locks.py is defined like so:

    def release(self):
        """Release a lock.

        When the lock is locked, reset it to unlocked, and return.
        If any other coroutines are blocked waiting for the lock to become
        unlocked, allow exactly one of them to proceed.

        When invoked on an unlocked lock, a RuntimeError is raised.

        There is no return value.
        """
        if self._locked:
            self._locked = False
            self._wake_up_first()
        else:
            raise RuntimeError('Lock is not acquired.')

A primitive lock is a synchronization primitive that is not owned by a particular thread when locked.

When attempting to release an unlocked lock by calling the release() method, the RuntimeError will be raised, as the method should only be called in the locked state. The state changes to unlocked when called in the locked state.

Now for the previous error raised in the acquire() method in the same file, the acquire() method is defined like so:

    async def acquire(self):
        """Acquire a lock.

        This method blocks until the lock is unlocked, then sets it to
        locked and returns True.
        """
        if (not self._locked and (self._waiters is None or
                all(w.cancelled() for w in self._waiters))):
            self._locked = True
            return True

        if self._waiters is None:
            self._waiters = collections.deque()
        fut = self._loop.create_future()
        self._waiters.append(fut)

        # Finally block should be called before the CancelledError
        # handling as we don't want CancelledError to call
        # _wake_up_first() and attempt to wake up itself.
        try:
            try:
                await fut
            finally:
                self._waiters.remove(fut)
        except exceptions.CancelledError:
            if not self._locked:
                self._wake_up_first()
            raise

        self._locked = True
        return True

So in order for the concurrent.futures._base.CancelledError error you're getting to be raised, the await fut must've caused the issue.

To fix it, you can have a look at Awaiting an asyncio.Future raises concurrent.futures._base.CancelledError instead of waiting for a value/exception to be set

Basically, you might have an awaitable in your code that you didn't await, and by not awaiting it, you never handed control back to the event loop or store the awaitable, causing it to be immediately cleaned up, completely cancelling it (and all of the awaitables it controlled).

Simply make sure you await the results of the awaitables in your code, finding any you missed.

like image 108
Ann Zen Avatar answered Oct 02 '22 12:10

Ann Zen