Users erratically getting CancelledError with Django ASGI

Question

Our users are erratically getting CancelledError for any page in our system. The only pattern we’ve observed is that this happens more often for pages which take more time to load during normal operation. But it is absolutely not limited to such pages, it can happen anywhere in our system, e.g. login page. All of the affected pages do not use any async code or channels, they’re standard django views working in request/response model (we migrated to ASGI only recently and we only have a single page which uses channels and it works just fine). We cannot reproduce it consistently.

What we see in sentry.io:

CancelledError: null
  File "channels/http.py", line 198, in __call__
    await self.handle(scope, async_to_sync(send), body_stream)
  File "asgiref/sync.py", line 435, in __call__
    ret = await asyncio.wait_for(future, timeout=None)
  File "asyncio/tasks.py", line 414, in wait_for
    return await fut

Locally and in Daphne logs it look like it:

2022-10-12 20:00:00,000 WARNING Application instance <Task pending coro=<ProtocolTypeRouter.__call__() running at /home/deploy/.virtualenvs/…/lib/python3.7/site-packages/channels/routing.py:71> wait_for=<Future pending cb=[_chain_future.._call_check_cancel() at /usr/lib/python3.7/asyncio/futures.py:348, <Task WakeupMethWrapper object at 0x7f1adcbf9610>()]>> for connection <WebRequest at 0x7f1adcc6bb50 method=POST uri=/dajaxice/operations.views.calculate_cost_view/ clientproto=HTTP/1.0> took too long to shut down and was killed. 2022-10-12 20:00:00,000 WARNING Application timed out while sending response

From the user’s POV, the page simply fails to load and they have to re-click a button or refresh the page.

Libraries what we use:

python = 3.7
Django = 2.2.12
channels = 3.0.5
channel-redis = 3.4.1

On server we use: Nginx, supervisor, Daphne.

For all requests (HTTP and websockets) we use ASGI.

Our command for running daphne: daphne -t 300 project.asgi:application

What we already tried to do:

Adding timeout to Daphne (as you can see above)
Update channels library from 3.0.4. to 3.0.5 (because we found info that asgiref 3.3.1, that used in channels 3.0.4, could be the culprit for this issue: https://lightrun.com/answers/django-channels-warning---server---application-instance-took-too-long-to-shut-down-and-was-killed)

Any idea what this is caused by or how to troubleshoot it?

JM217 · Accepted Answer

I had a similar issue before with almost the same tech stacks and it took several days for us to fix.

At that time the cause was that the database server was out of resource. We used AWS RDS (MySQL) and the CPU usage was over 99% whenever we got the error.

Using AWS CloudWatch, you can check the CPU Utilization history. (While there are many other values to watch but CPU Utilization Rate was the only problematic one)

enter image description here

After upgrading the DB instance type, the problems were gone right away.

Read more here about AWS CloudWatch for RDS.

Users erratically getting CancelledError with Django ASGI

Tags:

python

django

django-channels

asgi

Olha Trokhymchuk

1 Answers

JM217

Recent Activity

Donate For Us

Users erratically getting CancelledError with Django ASGI

Tags:

python

django

django-channels

asgi

Olha Trokhymchuk

1 Answers

JM217

Related questions

Recent Activity

Donate For Us