Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Cron jobs be used to simultate multi threading with PHP?

I have a mysql database table filled with 1000+ records, lets say 5000 records. Each record has a processed boolean flag, default to false (0). What I'd like to do is have a PHP script run on cron every minute. Its code would be something like this:

<?php
process();

function process()
{
   $sql = "SELECT id FROM items WHERE processed = '0' ORDER BY id ASC LIMIT 1";
   $result = $this->db->query($sql);

   if (! $result->has_rows())
     die;

   $id = $result->getSingle('id');
   processItem($id); //Will set processed to 1 after processing is done
   process();
}
?>

It should be pretty clear what the above code does, it gets the id for the next record which is un-processed, processes it, and then calls the process() function again which repeats this process until there are no more items to be processed, at which point the execution would stop.

By putting this script on Cron to run every minute, I hope to have multiple instances of this script all working at processing the items simultaneously, so rather than processing one item at a time, 5-10+ items could be getting processed simultaneously.

1) Is this going to work the way I'm planning it? Any suggestions for improvements / things to watch out for?

2) Should I have the script set a counter for the number of running instances, so whenever a cron job starts, it checks the counter, if 50 (?) instances are running it would exit without processing. That might keep the server from crashing by having too many running processes using up too much memory? Any thoughts?

like image 464
Ali Avatar asked Feb 27 '23 14:02

Ali


2 Answers

I have a couple of things to say:

Firstly you are using recursion to process multiple rows. This could lead to problems if you recurse too deep. Instead use a simple loop.

Secondly, do you know if this code can benefit from being run multiple times? If the machine is CPU bound it might not benefit from another thread. I suggest you check manually how many threads work best. More threads does not always make things go faster and in some cases can actually slow everything down.

Finally, I would certainly put a limit on how many of these scripts can run concurrently. This can be achieved simply by ensuring each script runs for no longer than say 5 minutes. Or you can keep a count of active scripts and ensure it doesn't go over the maximum number you determined in my second suggestion.

Edit: I've added some more information about the problem recursion can cause: Each time you recursively call a function extra space is used up on the stack. This space stores any local variables as well as the address of the function (allowing it to restore the state when the called function exits). The stack only has a finite amount of space so eventually your program will crash with a stack overflow. Try running this simple program:

function a($i) { 
   print $i . "\n"; 
   a($i + 1);
}
a(0);

On my system it crashes PHP after 608739 iterations. This number could be a lot smaller in a more complex function. A simple loop does not have these overheads therefore it does not have this problem.

like image 196
bramp Avatar answered Mar 02 '23 13:03

bramp


The recursion doesn't seem necessary at all, and like bramp said, could lead to problems. Why not just

$sql = "SELECT id FROM items WHERE processed = '0' ORDER BY id ASC LIMIT 1";

while ( ($result = $this->db->query($sql) && $result->has_rows() ) {
   processItem( $result->getSingle('id') );
}

However, I foresee larger problems here. If you are going to be running this script every minute, what mechanism do you have in place to stop execute of previously executed scripts that may still be running? You may end up processing the same ID more than once.

If you absolutely require a (pseudo) multi-threaded approach, I suggest the following:

  1. Grab a range of or all of the unprocessed ID's, not just one at a time.
  2. Using the curl_multi_ family of functions, pass subsets of the above results (groups of n id's) to another script to do the actual processing.

This method allows you to have greater control over the whole process, and prevents the unnecessary single-querying to get the unprocessed ID's.

like image 44
Justin Johnson Avatar answered Mar 02 '23 15:03

Justin Johnson