We hosted a site in AWS EC2 of type c4.8xlarge. It is a fairly large system with lot of memory and compute resources. Thousands of users tried to access the system during a 2 hour timeframe this weekend. While it did not crash, it slowed down quite a bit and failed to perform at the expected level. Analyzing the stats showed that limited network bandwidth is the main cause of the slowdown. The CPU usage stayed below 6%, but NetworkIn and NetworkOut seem to have peaked at 60MB and 200MB respectively during that timeframe. While I'm not an networking expect, some reading online seemed to indicate that all the traffic going through one NIC could be the main cause of limited network bandwidth. Is this true? Would hosting the site on a different type of EC2 instance help increase the network bandwidth? Here is how the networkIn and networkOut metrics looked like under heavy load.
If you were limited by bandwidth, that graph would become flat when you hit the limit. Further, as others pointed out that is only 1 MB/s out and 3 MB/s in, and I can do more than that on a t2.micro to the external internet.
What is the system doing with each request? Here are a list of things I would look at, in order:
It's not impossible but very unlikely that you might be bottlenecked at the network layer with connection creation or packets-per-second, if your requests are very small.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With