While using MySql (Aurora) I started noticing strange queries going missing and workers simply hanging forever. After some investigation it seems that the same code running in AWS is OK, but on Azure it simply hangs, forever!
Seems like the reason for this is Azure simply killing connections that are idle (like a long running query). Note that I can reproduce this on a VM that doesn't even have a load balancer ahead of it.
This is reproducible with the following date && time mysql -h$SERVER -u$USER -D mydb -p$PASS -e "SELECT SLEEP(260);"
Note that I tested it with 240, 250 and at 260 it dies. But not just dies, hangs, forever! Looks like Azure doesn't even bother to tell the socket to die so the MySql client hangs.
We have queries running from NodeJs and Python, so I need a solution that works in both if possible.
See here for a good example: https://imgur.com/gallery/FCV8ZWb (note I had to kill mysql in another session for it to actually release)
After some research I found a low level workaround which should always work, I inject a lib into any binary that makes all the sockets KEEP-ALIVE using LD_PRELOAD. The lib I inject is libdontdie, a fork of an older lib: libkeepalive.
After building the lib I run:
date && time DD_DEBUG=1 DD_TCP_KEEPALIVE_TIME=4 DD_TCP_KEEPALIVE_INTVL=5 DD_TCP_KEEPALIVE_PROBES=6 LD_PRELOAD=/usr/lib/libdontdie.so mysql -h$SERVER -u$USER -D mydb -p$PASS -e "SELECT SLEEP(300);"
And it works as expected (tested on both a VM in azure and a docker image inside AKS).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With