Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Errors shown by k6 when reaching a bigger number of virtual users

I'm evaluating k6 for my load testing needs. I've set up a basic load test and I'm currently trying to interpret the error messages and result values I get. Maybe someone can help me interpret what I'm seeing:

If I crank up the VUS to about 300, I start seeing error messages in the console and at 500 lots of error messages.

These mostly consist of:

  • dial tcp XXX:443: i/o timeout
  • read tcp YYY(local ip):35252->XXX(host ip):443: read: connection reset by peer
  • level=warning msg="Request Failed" error="unexpected EOF"
  • Get https://REQUEST_URL/: context deadline exceeded"

I also have problems with several checks:

  • check errors in which res.status === 0 and res.body === null
  • check errors in which res.status === 0, but the body contains the correct content

How can res.status be 0 but the body still contains the proper values?

I suspect that I'm reaching the connection limit of my load producing machine and that's why I get the error messages. So I'd have to set up a cluster or move to the Cloud runners!?

The stats generated by k6 show long http_req_blocked values, which I interpret as the time waiting to get a connection port. This seems to indicate that the connection pool of my test running machine is at its limits.

http_req_blocked...........: avg=5.66s    min=0s    med=3.26s    max=59.38s p(90)=13.12s   p(95)=20.31s 
http_req_connecting........: avg=1.85s    min=0s    med=280.16ms max=24.27s p(90)=4.2s     p(95)=9.24s  
http_req_duration..........: avg=2.05s    min=0s    med=496.24ms max=1m0s   p(90)=4.7s     p(95)=8.39s  
http_req_receiving.........: avg=600.94ms min=0s    med=82.89µs  max=58.8s  p(90)=436.95ms p(95)=2.67s  
http_req_sending...........: avg=1.42ms   min=0s    med=35.8µs   max=11.76s p(90)=56.22µs  p(95)=62.45µs
http_req_tls_handshaking...: avg=3.85s    min=0s    med=1.78s    max=58.49s p(90)=8.93s    p(95)=15.81s 
http_req_waiting...........: avg=1.45s    min=0s    med=399.43ms max=1m0s   p(90)=3.23s    p(95)=5.87s 

Can anyone help me out interpret the results I'm seeing?

like image 789
SebastianR Avatar asked Mar 03 '23 14:03

SebastianR


1 Answers

You are likely running out of CPU on the runner. As explained in the http specific metrics of the documentation, you are right about http_req_blocked it is (mostly) the time from when we say we want to make a request to when we get a socket on which to do it. This is most likely because:

  1. the test runner is running out of CPU and can't handle both making all the other request and starting new
  2. the system under test is running out of CPU and has ... the same problem

You will need to monitor them (you are highly advised to do this regardless) as test at 100% runner CPUs are probably not very representable :) and you likely don't want the system you are testing to get to 100% as well.

The status code === 0 means that we couldn't make the request/read the response ... for some reason, usually explained by the error and error_code.

As I commented if you have status code 0 and a body this is most likely a bug ... at least I don't remember there being a case where this won't be true.

The errors you have list mean (most likely):

dial tcp XXX:443: i/o timeout

this is literally we tried to get a tcp connection and it took too long (probably the reason for the big http_req_blocking)

read tcp YYY(local ip):35252->XXX(host ip):443: read: connection reset by peer

the other side closed the connection .. likely because some timeout was reached - for example, if we don't read over 30 seconds the server decides that we won't read anymore and closes it ... and in the case where CPU is 100% there is a good chance some connection won't get time to be read from.

level=warning msg="Request Failed" error="unexpected EOF"

literally, what it says .. the connection was closed when we totally didn't expect, or more accurately the golang net/http stdlib didn't expect. Likely again a timeout just at a point in the life of the request where the other errors aren't returned.

Get https://REQUEST_URL/: context deadline exceeded"

This is because a request took longer then the timeout (by default 60s) and will at some point be changed to a better error message.

like image 125
Михаил Стойков Avatar answered May 13 '23 02:05

Михаил Стойков