At the end of last week I noticed a problem on one of my medium AWS instances where Nginx always returns a HTTP 499 response if a request takes more than 60 seconds. The page being requested is a PHP script
I've spent several days trying to find answers and have tried everything that I can find on the internet including several entries here on Stack Overflow, nothing works.
I've tried modifying the PHP settings, PHP-FPM settings and Nginx settings. You can see a question I raised on the NginX forums on Friday (http://forum.nginx.org/read.php?9,237692) though that has received no response so I am hoping that I might be able to find an answer here before I am forced to moved back to Apache which I know just works.
This is not the same problem as the HTTP 500 errors reported in other entries.
I've been able to replicate the problem with a fresh micro AWS instance of NginX using PHP 5.4.11.
To help anyone who wishes to see the problem in action I'm going to take you through the set-up I ran for the latest Micro test server.
You'll need to launch a new AWS Micro instance (so it's free) using the AMI ami-c1aaabb5
This PasteBin entry has the complete set-up to run to mirror my test environment. You'll just need to change example.com within the NginX config at the end
http://pastebin.com/WQX4AqEU
Once that's set-up you just need to create the sample PHP file which I am testing with which is
<?php sleep(70); die( 'Hello World' ); ?>
Save that into the webroot and then test. If you run the script from the command line using php or php-cgi, it will work. If you access the script via a webpage and tail the access log /var/log/nginx/example.access.log, you will notice that you receive the HTTP 1.1 499 response after 60 seconds.
Now that you can see the timeout, I'll go through some of the config changes I've made to both PHP and NginX to try to get around this. For PHP I'll create several config files so that they can be easily disabled
Update the PHP FPM Config to include external config files
sudo echo ' include=/usr/local/php/php-fpm.d/*.conf ' >> /usr/local/php/etc/php-fpm.conf
Create a new PHP-FPM config to override the request timeout
sudo echo '[www] request_terminate_timeout = 120s request_slowlog_timeout = 60s slowlog = /var/log/php-fpm-slow.log ' > /usr/local/php/php-fpm.d/timeouts.conf
Change some of the global settings to ensure the emergency restart interval is 2 minutes
# Create a global tweaks sudo echo '[global] error_log = /var/log/php-fpm.log emergency_restart_threshold = 10 emergency_restart_interval = 2m process_control_timeout = 10s ' > /usr/local/php/php-fpm.d/global-tweaks.conf
Next, we will change some of the PHP.INI settings, again using separate files
# Log PHP Errors sudo echo '[PHP] log_errors = on error_log = /var/log/php.log ' > /usr/local/php/conf.d/errors.ini sudo echo '[PHP] post_max_size=32M upload_max_filesize=32M max_execution_time = 360 default_socket_timeout = 360 mysql.connect_timeout = 360 max_input_time = 360 ' > /usr/local/php/conf.d/filesize.ini
As you can see, this is increasing the socket timeout to 3 minutes and will help log errors.
Finally, I'll edit some of the NginX settings to increase the timeout's that side
First I edit the file /etc/nginx/nginx.conf and add this to the http directive fastcgi_read_timeout 300;
Next, I edit the file /etc/nginx/sites-enabled/example which we created earlier (See the pastebin entry) and add the following settings into the server directive
client_max_body_size 200; client_header_timeout 360; client_body_timeout 360; fastcgi_read_timeout 360; keepalive_timeout 360; proxy_ignore_client_abort on; send_timeout 360; lingering_timeout 360;
Finally I add the following into the location ~ .php$ section of the server dir
fastcgi_read_timeout 360; fastcgi_send_timeout 360; fastcgi_connect_timeout 1200;
Before retrying the script, start both nginx and php-fpm to ensure that the new settings have been picked up. I then try accessing the page and still receive the HTTP/1.1 499 entry within the NginX example.error.log.
So, where am I going wrong? This just works on apache when I set PHP's max execution time to 2 minutes.
I can see that the PHP settings have been picked up by running phpinfo() from a web-accessible page. I just don't get, I actually think that too much has been increased as it should just need PHP's max_execution_time, default_socket_timeout changed as well as NginX's fastcgi_read_timeout within just the server->location directive.
Having performed some further test to show that the problem is not that the client is dying I have modified the test file to be
<?php file_put_contents('/www/log.log', 'My first data'); sleep(70); file_put_contents('/www/log.log','The sleep has passed'); die('Hello World after sleep'); ?>
If I run the script from a web page then I can see the content of the file be set to the first string. 60 seconds later the error appears in the NginX log. 10 seconds later the contents of the file changes to the 2nd string, proving that PHP is completing the process.
Setting fastcgi_ignore_client_abort on; does change the response from a HTTP 499 to a HTTP 200 though nothing is still returned to the end client.
Having installed Apache and PHP (5.3.10) onto the box straight (using apt) and then increasing the execution time the problem does appear to also happen on Apache as well. The symptoms are the same as NginX now, a HTTP200 response but the actual client connection times out before hand.
I've also started to notice, in the NginX logs, that if I test using Firefox, it makes a double request (like this PHP script executes twice when longer than 60 seconds). Though that does appear to be the client requesting upon the script failing
HTTP code 499 usually is seen in NGINX's logs. Being a web server, NGINX 499 is able to identify that the problem is not in the server itself, or the entity which sent the request. HTTP error 499 simply means that the client shut off in the middle of processing the request through the server.
The cause of the problem is the Elastic Load Balancers on AWS. They, by default, timeout after 60 seconds of inactivity which is what was causing the problem.
So it wasn't NginX, PHP-FPM or PHP but the load balancer.
To fix this, simply go into the ELB "Description" tab, scroll to the bottom, and click the "(Edit)" link beside the value that says "Idle Timeout: 60 seconds"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With