Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP-FPM with mysqlnd stuck on connect and not timing out (gdb backtrace inside)

Tags:

php

mysql

I am running PHP-FPM, and have an issue during times of extremely high load that cause php processes to get stuck forever. I did a GDB backtrace of a running process that was stuck, and got this (irrelevant frames removed):

#0  0x00007ff51704bb90 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:81

#1  0x0000000000694694 in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7fff18a2c800) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46

#2  php_pollfd_for (timeouttv=0x2c18a30, events=25, fd=<optimized out>) at /build/buildd/php5-5.5.19+dfsg/main/php_network.h:165

#3  php_sock_stream_wait_for_data (stream=0x2df2b88, sock=0x2c18a28) at /build/buildd/php5-5.5.19+dfsg/main/streams/xp_socket.c:131

#4  php_sockop_read (stream=0x2df2b88, buf=0x2e03628 "X\221\340\002", count=4) at /build/buildd/php5-5.5.19+dfsg/main/streams/xp_socket.c:154

#5  0x00000000004a629a in php_openssl_sockop_read (stream=0x2df2b88, buf=0x2e03628 "X\221\340\002", count=<optimized out>) at /build/buildd/php5-5.5.19+dfsg/ext/openssl/xp_ssl.c:234

#6  0x0000000000688926 in _php_stream_fill_read_buffer (stream=stream@entry=0x2df2b88, size=size@entry=4) at /build/buildd/php5-5.5.19+dfsg/main/streams/streams.c:691

#7  0x0000000000688a87 in _php_stream_read (stream=stream@entry=0x2df2b88, buf=buf@entry=0x7fff18a2c9e0 "", size=size@entry=4) at /build/buildd/php5-5.5.19+dfsg/main/streams/streams.c:738

#8  0x00007ff5164b83a6 in php_mysqlnd_net_network_read_ex_pub (net=<optimized out>, buffer=<optimized out>, count=4, stats=0x2add010, error_info=<optimized out>)
    at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd_net.c:80

#9  0x00007ff5164b7cb6 in php_mysqlnd_net_receive_ex_pub (net=0x2e04238, buffer=0x7fff18a2c9e0 "", count=4, conn_stats=0x2add010, error_info=0x2e13860)
    at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd_net.c:682

#10 0x00007ff5164b118b in mysqlnd_read_header (net=0x2e04238, conn_stats=0x2add010, error_info=<optimized out>, header=<optimized out>, header=<optimized out>)
    at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd_wireprotocol.c:287

#11 0x00007ff5164b1d46 in php_mysqlnd_greet_read (_packet=0x2e14e98, conn=0x2e13728) at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd_wireprotocol.c:333

#12 0x00007ff5164ab8ad in php_mysqlnd_conn_data_connect_handshake_pub (conn=0x2e13728, host=<optimized out>, user=0x2b49d70 "testuser", passwd=0x2b49c40 "<PASSWORD REMOVED>", passwd_len=7, db=0x2b43c18 "testuser",
    db_len=7, mysql_flags=959117) at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd.c:774

#13 0x00007ff5164a9e71 in php_mysqlnd_conn_data_connect_pub (conn=0x2e13728, host=0x2e13558 "localhost", user=0x2b49d70 "testuser",
    passwd=0x2b49c40 "<PASSWORD REMOVED>", passwd_len=7, db=0x2b43c18 "testuser", db_len=7, port=3306, socket_or_pipe=0x7ff50a563b4c "/var/run/mysqld/mysqld.sock", mysql_flags=959117)
    at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd.c:958

#14 0x00007ff5164a6430 in php_mysqlnd_conn_connect_pub (conn_handle=0x2e136d8, host=0x2e13558 "localhost", user=0x2b49d70 "testuser",
    passwd=0x2b49c40 "<PASSWORD REMOVED>", passwd_len=7, db=0x2b43c18 "testuser", db_len=7, port=3306, socket_or_pipe=0x7ff50a563b4c "/var/run/mysqld/mysqld.sock", mysql_flags=131072)
    at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd.c:1098

#15 0x00007ff5164adc07 in mysqlnd_connect (conn_handle=0x2e136d8, host=<optimized out>, user=<optimized out>, passwd=<optimized out>, passwd_len=<optimized out>, db=<optimized out>, db_len=7, port=3306,
    socket_or_pipe=0x7ff50a563b4c "/var/run/mysqld/mysqld.sock", mysql_flags=131072) at /build/buildd/php5-5.5.19+dfsg/ext/mysqlnd/mysqlnd.c:1131

#16 0x00007ff50a55fe82 in mysqli_common_connect (ht=<optimized out>, return_value=0x2e05978, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>,
    is_real_connect=is_real_connect@entry=0 '\000', in_ctor=in_ctor@entry=1 '\001') at /build/buildd/php5-5.5.19+dfsg/ext/mysqli/mysqli_nonapi.c:242

#17 0x00007ff50a560633 in zif_mysqli_link_construct (ht=<optimized out>, return_value=<optimized out>, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>)
    at /build/buildd/php5-5.5.19+dfsg/ext/mysqli/mysqli_nonapi.c:320

So what I can see is that I'm attempting to connect to MySQL and then PHP is getting stuck polling the socket. It's possible MySQL dropped or rejected the connection (the database was at 100% cpu load).

However, I have mysql.connect_timeout set to 60, so I'd expect the connection to not last this long (it's been over 20 minutes). default_socket_timeout is also set to 300 seconds.

And I'll pre-empt any suggestions to upgrade to mysqli by saying that I've tried that (we're using a DBAL), and had the same issue as mysqli uses the same connection functions under the hood.

I am running PHP 5.5.19 on Ubuntu, and using the mysqlnd driver.

Any idea what could cause PHP to not timeout?

like image 978
Brandon Wamboldt Avatar asked Dec 09 '14 16:12

Brandon Wamboldt


2 Answers

I think you're hitting what I call the "dead usability band". What that means is that you're stuck between having your action completely fail (where it would time out) and yet not function either. So the chain of things is this

  1. You call your PHP script
  2. PHP attempts to open a connection to MySQL
  3. MySQL (at 100% load) is slow to respond but does open connection within the allotted timeout
  4. PHP sends the SQL
  5. MySQL (still at 100% load) is now likely stuck. The query eventually executes, but now you have the problem of of actually returning a response. So PHP gets... something. Say a trickle. MySQL won't time out (it's got a valid request it's responding to) and neither will PHP (it's getting data at a painfully slow rate). So both programs get stuck in this "dead band" where just enough is happening to avoid a timeout, but not enough to resolve the deadlock. You can sometimes see this in action doing a SHOW PROCESSLIST;

I realize that you're not sending SQL but I think the same principles come into play. You can sometimes see this happen hitting a remote database using MySQL Workbench (where a query executes but fails to return results). So Workbench gets hung up waiting on a response that will never practically happen.

So how to fix this? As a practical matter, you can't at the programming level of PHP. The PHP API doesn't expose enough of the underlying calls to allow you to recognize that you're stuck in a dead band and terminate the connection. And writing a script that looks for dead bands will always run the risk of terminating otherwise good processes. Your best bet (as much as I hate to say it) is to bolster your MySQL and make sure it has enough resources to not hit 100%. If you're sharing a server with PHP/Apache that's a recipe for this kind of thing. I've noticed it rarely happens on our bigger and better system since we moved MySQL to its own instance. Also, check out mysqltuner. It can look at your instance of MySQL and suggest configuration tuning to improve performance.

like image 121
Machavity Avatar answered Sep 19 '22 11:09

Machavity


A few things to check:

  • Network saturation. Maybe dropped frames.
  • Server side error logs.
  • You can check with netstat which port your php opened, and you can get the remote port on mysql server side, where you can locate maybe the given mysql thread and possible using mysql debugging tools maybe you can its status information. Anyway netstat can help you to check the connection's status. If it is established on both sides then the problem is on application level, if the connection is in handshake or break down or something similar then a network issue is also probable.
  • If you can reproduce this behaviour easily then it worth a try to check with strace the given process. The timeout settings are usually implemented with alarm() or something similar.
  • Maybe it would be useful to show us your php code where you connect. (Or a minimal example which can reproduce the issue.)
like image 31
Lajos Veres Avatar answered Sep 20 '22 11:09

Lajos Veres