Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache/mod_wsgi process dies unexpectedly

I'm testing the limit of my Python Flask web application running on an Apache web server by making a request that takes over 30minutes to complete. The request requires thousands of database requests (one after the other) to a MySQL database. I understand this should ideally be run as a separate asynchronous process outside the apache server, but let's ignore that for now. The problem I'm having is that although this runs completely when I test it on my mac, it dies abruptly when running it on a linux server (Amazon linux on AWS EC2). I've not been able to figure out exactly what's killing it. I've checked that the server isn't running out of memory. The process uses very little RAM. I've not been able to find any Apache config parameter or any error message that makes sense to me (even after setting apache logLevel to debug). Please I need help on where to look. Here're more details about my setup:


Run Time

Server: It died after 8mins, 27mins, 21mins & 22mins respectively. Note that most of these runs were on a UAT server and this was the only request the server was processing.

Mac: It ran much slower that it runs on the server. The process ran successfully and took 2hours 47mins.


Linux Server details:
2 virtual CPUs and 4GB RAM

OS (output of uname -a)
Linux ip-172-31-63-211 3.14.44-32.39.amzn1.x86_64 #1 SMP Thu Jun 11 20:33:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Apache error_log: https://drive.google.com/file/d/0B3XXZfJyzJYsNkFDU3hJekRRUlU/view?usp=sharing

Apache config file: https://drive.google.com/file/d/0B3XXZfJyzJYsM2lhSmxfVVRNNjQ/view?usp=sharing

Apache version (output of apachectl -V)

Server version: Apache/2.4.23 (Amazon)  
Server built:   Jul 29 2016 21:42:17  
Server's Module Magic Number: 20120211:61  
Server loaded:  APR 1.5.1, APR-UTIL 1.4.1  
Compiled using: APR 1.5.1, APR-UTIL 1.4.1  
Architecture:   64-bit  
Server MPM:     prefork  
  threaded:     no  
    forked:     yes (variable process count)  
Server compiled with....  
 -D APR_HAS_SENDFILE  
 -D APR_HAS_MMAP  
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)  
 -D APR_USE_SYSVSEM_SERIALIZE  
 -D APR_USE_PTHREAD_SERIALIZE  
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT  
 -D APR_HAS_OTHER_CHILD  
 -D AP_HAVE_RELIABLE_PIPED_LOGS  
 -D DYNAMIC_MODULE_LIMIT=256  
 -D HTTPD_ROOT="/etc/httpd"  
 -D SUEXEC_BIN="/usr/sbin/suexec"  
 -D DEFAULT_PIDLOG="/var/run/httpd/httpd.pid"  
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"  
 -D DEFAULT_ERRORLOG="logs/error_log"  
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"  
 -D SERVER_CONFIG_FILE="conf/httpd.conf"  

Mac details:

Apache config file: https://drive.google.com/file/d/0B3XXZfJyzJYsRUd6NW5NY3lON1U/view?usp=sharing

Apache version (output of apachectl -V)

Server version: Apache/2.4.18 (Unix)  
Server built:   Feb 20 2016 20:03:19  
Server's Module Magic Number: 20120211:52  
Server loaded:  APR 1.4.8, APR-UTIL 1.5.2  
Compiled using: APR 1.4.8, APR-UTIL 1.5.2  
Architecture:   64-bit  
Server MPM:     prefork  
  threaded:     no  
    forked:     yes (variable process count)  
Server compiled with....  
 -D APR_HAS_SENDFILE  
 -D APR_HAS_MMAP  
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)  
 -D APR_USE_FLOCK_SERIALIZE  
 -D APR_USE_PTHREAD_SERIALIZE  
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT  
 -D APR_HAS_OTHER_CHILD  
 -D AP_HAVE_RELIABLE_PIPED_LOGS  
 -D DYNAMIC_MODULE_LIMIT=256  
 -D HTTPD_ROOT="/usr"  
 -D SUEXEC_BIN="/usr/bin/suexec"  
 -D DEFAULT_PIDLOG="/private/var/run/httpd.pid"  
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"  
 -D DEFAULT_ERRORLOG="logs/error_log"  
 -D AP_TYPES_CONFIG_FILE="/private/etc/apache2/mime.types"  
 -D SERVER_CONFIG_FILE="/private/etc/apache2/httpd.conf"  
like image 843
Kes115 Avatar asked Oct 02 '16 05:10

Kes115


2 Answers

If you are using embedded mode of mod_wsgi that can happen as Apache controls the life time of processes and can recycle them if it thinks a process is no longer required due to insufficient traffic.

You might be thinking 'but I am using daemon mode and not embedded mode', but reality is that you aren't as your configuration is wrong. You have:

<VirtualHost *:5010>
    ServerName localhost

    WSGIDaemonProcess entry user=kesiena group=staff threads=5
    WSGIScriptAlias "/" "/Users/kesiena/Dropbox (MIT)/Sites/onetext/onetext.local.wsgi"

    <directory "/Users/kesiena/Dropbox (MIT)/Sites/onetext/app">
        WSGIProcessGroup start
        WSGIApplicationGroup %{GLOBAL}
        WSGIScriptReloading On
        Order deny,allow
        Allow from all
    </directory>
</virtualhost>

That Directory block doesn't use a directory which matches the path in WSGIScriptAlias, so none of it applies.

Use:

<VirtualHost *:5010>
    ServerName localhost

    WSGIDaemonProcess entry user=kesiena group=staff threads=5
    WSGIScriptAlias "/" "/Users/kesiena/Dropbox (MIT)/Sites/onetext/onetext.local.wsgi"

    <directory "/Users/kesiena/Dropbox (MIT)/Sites/onetext">
        WSGIProcessGroup start
        WSGIApplicationGroup %{GLOBAL}
        Order deny,allow
        Allow from all
    </directory>
</virtualhost>

The only reason it worked at all without that matching is that you had opened up access to Apache to host files in that directory by having:

<Directory "/Users/kesiena/Dropbox (MIT)/Sites">
    Require all granted
</Directory>

It is bad practice to also set DocumentRoot to be a parent directory of where your application source code exists. With the way it is written there is a risk I could come in on a different port or VirtualHost and download all your application code.

Do not stick your application code under the directory listed against DocumentRoot.

BTW, even when you have the WSGI application running in daemon mode, Apache can still recycle the worker processes it will use to proxy requests to mod_wsgi. So even if your very long running request keeps running in the WSGI application process, it could fail as soon as it starts to send a response if the worker process had been recycled in the interim because it had been running too long.

You should definitely farm out the long running operation to a back end Celery task queue or similar.

like image 143
Graham Dumpleton Avatar answered Oct 06 '22 03:10

Graham Dumpleton


You might be hitting forced socket closures, though with the times you gave that does not look too likely. For a project I had on Azure, any connection that was idle for about 3 minutes would get closed by the system. I believe these closures were done ahead of the server in the network routing, so there was no way to disable them or increase the timeout.

like image 41
Brad Howes Avatar answered Oct 06 '22 01:10

Brad Howes