Fully loaded multi-tenant Django application with 1000's of WebSockets using Daphne/Channels, running fine for a few months and suddenly tenants all calling it the support line the application running slow or outright hanging. Narrowed it down to WebSockets as HTTP REST API hits came through fast and error free.
None of the application logs or OS logs indicate some issue, so only thing to go on is the exception noted below. It happened over and over again here and there throughout 2 days.
I don't expect any deep debugging help, just some off-the-cuff advice on possibilities.
AWS Linux 1
Python 3.6.4
Elasticache Redis 5.0
channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0
Django==2.2.13
Split configuration HTTP served by uwsgi, daphne serves asgi, Nginx
May 10 08:08:16 prod-b-web1: [pid 15053] [version 119.5.10.5086] [tenant_id -] [domain_name -] [pathname /opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/daphne/server.py] [lineno 288] [priority ERROR] [funcname application_checker] [request_path -] [request_method -] [request_data -] [request_user -] [request_stack -] Exception inside application: Lock is not acquired.
Traceback (most recent call last):
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 435, in receive
real_channel
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 484, in receive_single
await self.receive_clean_locks.acquire(channel_key)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 152, in acquire
return await self.locks[channel].acquire()
File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 176, in acquire
yield from fut
concurrent.futures._base.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
return await self.inner(receive, self.send)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/middleware.py", line 41, in coroutine_call
await inner_instance(receive, send)
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
[receive, self.channel_receive], self.dispatch
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels/utils.py", line 58, in await_many_dispatch
await task
File "/opt/releases/r119.5.10.5086/env/lib/python3.6/site-packages/channels_redis/core.py", line 447, in receive
self.receive_lock.release()
File "/opt/python3.6/lib/python3.6/asyncio/locks.py", line 201, in release
raise RuntimeError('Lock is not acquired.')
RuntimeError: Lock is not acquired.
First, lets have a look at the source of the RuntimeError: Lock is not acquired.
error. As given by the traceback, the release()
method in the file /opt/python3.6/lib/python3.6/asyncio/locks.py
is defined like so:
def release(self):
"""Release a lock.
When the lock is locked, reset it to unlocked, and return.
If any other coroutines are blocked waiting for the lock to become
unlocked, allow exactly one of them to proceed.
When invoked on an unlocked lock, a RuntimeError is raised.
There is no return value.
"""
if self._locked:
self._locked = False
self._wake_up_first()
else:
raise RuntimeError('Lock is not acquired.')
A primitive lock is a synchronization primitive that is not owned by a particular thread when locked.
When attempting to release an unlocked lock by calling the release()
method, the RuntimeError
will be raised, as the method should only be called in the locked state. The state changes to unlocked when called in the locked state.
Now for the previous error raised in the acquire()
method in the same file, the acquire()
method is defined like so:
async def acquire(self):
"""Acquire a lock.
This method blocks until the lock is unlocked, then sets it to
locked and returns True.
"""
if (not self._locked and (self._waiters is None or
all(w.cancelled() for w in self._waiters))):
self._locked = True
return True
if self._waiters is None:
self._waiters = collections.deque()
fut = self._loop.create_future()
self._waiters.append(fut)
# Finally block should be called before the CancelledError
# handling as we don't want CancelledError to call
# _wake_up_first() and attempt to wake up itself.
try:
try:
await fut
finally:
self._waiters.remove(fut)
except exceptions.CancelledError:
if not self._locked:
self._wake_up_first()
raise
self._locked = True
return True
So in order for the concurrent.futures._base.CancelledError
error you're getting to be raised, the await fut
must've caused the issue.
To fix it, you can have a look at Awaiting an asyncio.Future raises concurrent.futures._base.CancelledError instead of waiting for a value/exception to be set
Basically, you might have an awaitable in your code that you didn't await, and by not awaiting it, you never handed control back to the event loop or store the awaitable, causing it to be immediately cleaned up, completely cancelling it (and all of the awaitables it controlled).
Simply make sure you await the results of the awaitables in your code, finding any you missed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With