I'm having a problem with gearman workers running on multiple servers which i can't seem to solve.
The problem occurs when a worker server is taken offline, rather than the worker process being cancelled, and causes all other worker processes to error and fail.
Example with just 1 client and 2 workers -
Client:
$client = new GearmanClient ();
$client->addServer ('192.168.1.200');
$client->addServer ('192.168.1.201');
$job = $client->do ('generate_tile', serialize ($arrData));
Worker:
$worker = new GearmanWorker ();
$worker->addServer ('192.168.1.200');
$worker->addServer ('192.168.1.201');
$worker->addFunction ('generate_tile', 'generate_tile');
while (1)
{
if (!$worker->work ())
{
switch ($worker->returnCode ())
{
default:
echo "Error: " . $worker->returnCode () . ': ' . $worker->error () . "\n";
break;
}
}
}
function generate_tile ($job) { ... }
The worker code is being run on 2 separate servers. When every server is up and running both workers execute jobs as expected. When one of the worker processes is cancelled, the other worker executes all jobs as expected.
However, when the server with the cancelled worker process is shutdown and taken completely offline, requests to the client script hang and the remaining worker process does not pick up any jobs.
I get the following set of errors from the remaining worker process:
Error: 46: gearman_con_wait:timeout reached
Error: 46: gearman_con_wait:timeout reached
Error: 4: gearman_con_flush:write:110
Error: 46: gearman_con_wait:timeout reached
Error: 4: gearman_con_flush:write:113
Error: 4: gearman_con_flush:write:113
Error: 4: gearman_con_flush:write:113
....
When i start-up the other server, not starting the worker process on it, the remaining worker process immediately jumps into life and executes any remaining jobs.
It seems clear to me that i need some code in the worker process to cope with any servers that may be offline, however i cannot see how to do this.
Many thanks,
Andy
Our tests with multiple gearman servers shows that if the last server in the list (192.168.1.201 in your case) is taken down, the workers stop executing the way you are describing. (Also, the workers grab jobs from the last server. They process jobs on .200 only if on .201 there are no jobs).
It seems that this is a bug with the linked list in the gearman server, which is reported to be fixed multiple times, but with all available versions of gearman, the bug persist. Sorry, I know that's not a solution, but we had the same problem and didn't found a solution. (if someone can provide working solution for this problem, I agree to give large bounty)
Further to @Darhazer 's comment above. We found that as well and solved like thus :-
// Gearman workers show a strong preference for servers at the end of a list so randomize the order
$worker = new GearmanWorker();
$s2 = explode(",", Configure::read('workers.servers'));
shuffle($s2);
$servers = implode(",", $s2);
$worker->addServers($servers);
We run 6 to 10 workers at any time, and expire them after they've completed x requests.
I use this class, which keep track of which jobs work on which servers. It hasn't been thoroughly tested, just wrote it now. I've pasted an edited version, so there might be a typo or somesuch, but otherwise appears to solve the issue.
<?
class MyGearmanClient {
static $server = "server1,server2,server3";
static $server_array = false;
static $workingServers = false;
static $gmclient = false;
static $timeout = 5000;
static $defaultTimeout = 5000;
static function randomServer() {
return self::$server_array[rand(0, count(self::$server_array) -1)];
}
static function getServer($job = false) {
if (self::$server_array == false) {
self::$server_array = explode(",", self::$server);
self::$workingServers = array();
}
$serverList = array();
if ($job) {
if (array_key_exists($job, self::$workingServers)) {
foreach (self::$server_array as $server) {
if (array_key_exists($server, self::$workingServers[$job])) {
if (self::$workingServers[$job][$server]) {
$serverList[] = $server;
}
} else {
$serverList[] = $server;
}
}
if (count($serverList) == 0) {
# All servers have failed, need to insert all the servers again and retry.
$serverList = self::$workingServers[$job] = self::$server_array;
}
return $serverList[rand(0, count($serverList) - 1)];
} else {
return self::randomServer();
}
} else {
return self::randomServer();
}
}
static function serverWorked($server, $job) {
self::$workingServers[$job][$server] = $server;
}
static function serverFailed($server, $job) {
self::$workingServers[$job][$server] = false;
}
static function Connect($server = false, $job = false) {
if ($server) {
self::$server = self::getServer();
}
self::$gmclient= new GearmanClient();
self::$gmclient->setTimeout(self::$timeout);
# add the default job server
self::$gmclient->addServer($server = self::getServer($job));
return $server;
}
static function Destroy() {
self::$gmclient = false;
}
static function Client($name, $vars, $timeout = false) {
if (is_int($timeout)) {
self::$timeout = $timeout;
} else {
self::$timeout = self::$defaultTimeout;
}
do {
$server = self::Connect(false, $name);
$value = self::$gmclient->do($name, $vars);
$return_code = self::$gmclient->returnCode();
if (!$value) {
$error_message = self::$gmclient->error();
if ($return_code == 47) {
self::serverFailed($server, $name);
if (count(self::$server_array) > 1) {
// ADDED SINGLE SERVER LOOP AVOIDANCE // echo "Timeout on server $server, trying another server...\n";
continue;
} else {
return false;
}
}
echo "ERR: $error_message ($return_code)\n";
}
# printf("Worker has returned\n");
$short_value = substr($value, 0, 80);
switch ($return_code)
{
case GEARMAN_WORK_DATA:
echo "DATA: $short_value\n";
break;
case GEARMAN_SUCCESS:
self::serverWorked($server, $name);
break;
case GEARMAN_WORK_STATUS:
list($numerator, $denominator)= self::$gmclient->doStatus();
echo "Status: $numerator/$denominator\n";
break;
case GEARMAN_TIMEOUT:
// self::Connect();
// Fall through
default:
echo "ERR: $error_message " . self::$gmclient->error() . " ($return_code)\n";
break;
}
}
while($return_code != GEARMAN_SUCCESS);
$rv = unserialize($value);
return $rv["rv"];
}
}
# Example usage:
# $rv = MyGearmanClient::Client("Function", $args);
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With