A resource on my webapp takes nearly a minute to load after a long stall. This happens consistently. As shown below, only 3 requests on this page actually hit the server itself, the rest hit the memory or disk cache. This problem only seems to occur on Chrome, both Safari and Firefox do not exhibit this behavior.
I have implemented the Cache-Control: no-store
suggestion in this SO question but the problem persists. request stalled for a long time occasionally in chrome
Also included below is an example of what the response looks like once it finally does come in.
My app is hosted in AWS behind a Network Load Balancer which proxies to an EC2 instance running nginx and the app itself.
Any ideas what is causing this?
I encountered the exact same problem. We are using Elastic Beanstalk with Network Load Balancer (NLB) with TLS termination at NLB.
The feedback I got from AWS support is:
This problem can occur when a client connects to a TLS listener on a Network Load Balancer and does not send data immediately after completing the TLS handshake. The root cause is an edge case in the handling of new connections. Note that this only occurs if the Target Group for the TLS listener is configured to use the TCP protocol without Proxy Protocol v2 enabled
They are working on a fix for this issue now. Somehow this problem can only be noticed when you are using Chrome browser.
In the meantime, you have these 2 options as workaround:
I know it's a late answer but I write it for someone seeking a solution.
TL;DR: In my case, enabling cross-zone load balancing
attribute of NLB solved the problem.
With investigation using WireShark I figured out there were two different IPv4 addresses Chrome communicated with. Sending packets to one of them always succeeded and to the other always failed. Actually the two addresses delegated two Availability Zones.
By default, cross-zone load balancing is disabled if you choose NLB (on the contrary the same attribute of ALB is enabled by default). Let's say there are two AZs; AZ-1 / AZ-2. When you attach both AZs to a NLB, it has a node for each AZ.
The node belongs to AZ-1 just routes traffic to instances which also belong to AZ-1. AZ-2 instances are ignored. My modest app (hosted on Fargate) has just one app server (ECS task) in AZ-2 so that the NLB node in AZ-1 cannot route traffic to anywhere.
I'm not familiar with TCP/IP or Browser implementation but in my understanding, your browser somehow selects the actual ip address after DNS lookup. If the AZ-2 node is selected in the above case then everything goes fine, but if the AZ-1 is selected your browser starts stalling. Maybe Chrome has a random strategy to select ip while Safari or Firefox has a sticky one, so that the problem only appears on Chrome.
After enabling cross-zone load balancing
the ECS task on AZ-2 is visible from the AZ-1 NLB node, and it works fine with Chrome browser too.
(Please feel free to correct my poor English. Thank you!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With