So, I am currently rebuilding my web platform from a single-machine to a cluster of machines, and I will be using Apache load balancing to do this., but I have two questions that I need a good answer to before proceeding. I have Googled and searched here in SO, but didn't find anything good.
My setup will be one Debian machine running the Apache load balancing server (i.e. Apache with mod_proxy) and then any number of "slave" machines, that are balancing members. All of these are VPS inside a VMWare machine, so setting up new slaves as needed will be trivial.
Log Files The first question is that of log files. In order to troubleshoot my platform, I sometimes need to analyze log files, both access logs and error logs, from Apache. When the load is evenly distributed (i.e. I don't know if I'll even use sticky balancing, any host could probably handle any request at any time), so would the log files for each slave Apache instance. Is there a way to consolidate these live, meaning that my live log analyzer could see the log files from all hosts? I certainly understand that doing so while the files being on several hosts would be difficult, so is there a way to make sure that all log files are kept on one server?
I'm thinking about two things myself, but I would greatly appreciate your input.
syslogd The first is syslogd, where in it would be possible for several hosts to write to one logging host. The problem with this is that in my current setup, each virtual host in apache, has its own log file. That could probably be fixed in some manner though. My main usage for this is for troubleshooting, not keeping separate logs for each host (albeit if both goals could be met, that would certainly be a bonus).
NFS My next thought was about NFS, i.e. having a NFS share on the LAN where each slave can write to the same log file. I'm going to go ahead and assume that this will be difficult since slave 1 would open the log file and then slave 2 wouldn't be able to write to it.
As I said, your input is greatly appreciated since I feel I'm stuck in how to solve this.
Configuration files This is another thing altogether. Each slave will respond to each request as if acting as one single server. That is the entire idea. But what about making changes to the apache configuration files, adding virtual hosts, setting up other parameters? What if I have ten slaves, or fifty? Is there a way to make sure that all these slaves are always in sync? I am already using a NFS export to make sure they all have the same files, but should I use the same approach with the configuration files? Or should I have these as some form of repository and then use rsync to copy them out to the slaves? One problem is that I have built an interface in my web platform that edits these configuration files (namely the file with the virtual hosts), and since that action would take place on one of the slaves, the most current copy of this file could potentially be on one slave.
I realize that this was a long and wieldy post, and I apologize. I just wanted to make sure that all the parameters of my problem were expressed.
I hope someone out there can help me, as you have before! Thank you in advance!
Apache load balancer is open source and provides a server application traffic distribution solution. According to recent statistics, it has been utilized in over 100,000 websites.
In Linux, Apache commonly writes logs to the /var/log/apache2 or /var/log/httpd directories depending on your OS and Virtual Host overrides. You can also define a LogFormat string after the filename, which will only apply the format string to this file.
The server error log, whose name and location is set by the ErrorLog directive, is the most important log file. This is the place where Apache httpd will send diagnostic information and record any errors that it encounters in processing requests.
I suggest not using NFS for logging as it can be a real performance killer. Instead use rsyslog with remote logging enabled. In you apache2.conf
you can setup a LogFormat that includes the VirtualHost name and then pipe the log to rsyslog telling it to write the output to a remote host.
In apache2.conf:
LogFormat "%v %{X-FORWARDED-FOR}i %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
CustomLog "|/usr/bin/logger -t apache2 -p local7.info" vhost_combined
In rsyslog.conf on the webserver:
local7.* @<remote host ip>
In rsyslog.conf on the remote host:
local7.* /var/log/webfrontends.log;precise
As for the Apache configuration files, we use NFS.apache2.conf
is a link to a remote file (different files for different machines if needed) and in apache2.conf
we use an Include
directive to read specific site configurations (different dirs for different machines if needed)
on the NFS server the NFS exported dir /NFS_EXPORT/etc/apache2/
contains:
- webserver1_apache2.conf
- webserver2_apache2.conf
- webserver1_vhosts (dir)
- webserver2_vhosts (dir)
Both webserver1_apache2.conf
and webserver2_apache2.conf
contain Include "/etc/apache2/vhosts"
on WebServer 1
ln -s /NFS_EXPORT/etc/apache2/webserver1_apache2.conf /etc/apache2/apache2.conf
ln -s /NFS_EXPORT/etc/apache2/webserver1_vhosts/ /etc/apache2/vhosts
on WebServer 2
ln -s /NFS_EXPORT/etc/apache2/webserver2_apache2.conf /etc/apache2/apache2.conf
ln -s /NFS_EXPORT/etc/apache2/webserver2_vhosts/ /etc/apache2/vhosts
If all your webservers are the same in terms of hardware specs and serve the same sites/applications then there is no need to differentiate the configs.
Of course you will need a script or some other mechanism to restart apache on all your server once you modify a configuration. Also, upgrading your apache2 software can be tricky unless you have root access to your NFS exports beacause typically your package management system will complain about not being able to modify some configuration file.
NFS will not help you with log files, for exactly the reasons you describe above. You should use syslogd (or some other solution like Splunk) to centralize the logging. It's trivial to include information about what host the log entry comes from, so you can still winnow down to per-host data when troubleshooting.
Configuration files: you need to either centralize them (a "master" copy), or have a way of distributing changes made on any server to all the others. I recommend centralization as the simpler approach. NFS will do the job here, or, as you suggest, a repository from which all hosts are periodically updated. There are a lot of options here, running all the way up to version control (SVN, git, etc) or even configuration servers (Chef, etc).
Please note that moving from a single server to a cluster has many implications. In both cases above (logging, config files), there is potential to introduce single points of failure if done naively. Since you have that already (one server), you're not worse off, but you should try to be aware of and plan for the failure scenarios you may need to respond to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With