Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharing variables between child processes in PHP?

I'm sure what I'm trying is very simple, but I've never quite worked with multithreading before so I'm not sure where to start.

I'm using PCNTL to create a multithreaded PHP application. What I wish to do is have 3 functions running concurrently and I want their returned values merged into a single array. So logically I need either some variable shared among all children to which they append their results, or three variables shared only between a single child and the parent - then the parent can merge the results later.

Problem is - I have no idea how to do this. The first thing that comes to mind is using shared memory, but I feel like there should be an easier method.

Also, if it has any effect, the function which forks the process is a public class method. So my code looks something like the following:

<?php
    class multithreaded_search {
        /* ... */
        /* Constructors and such */
        /* ... */
        public function search( $string = '' ) {
            $search_types = array( 'tag', 'substring', 'levenshtein' );
            $pids = array();
            foreach( $search_types as $type ) {
                $pid = pcntl_fork();
                $pids[$pid] = $type;
                if( $pid == 0 ) { // child process
                    /* confusion */
                    $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                    /* What do we do with $results ? */
                }
            }
            for( $i = 0; $i < count( $pids ); $i++ ) {
                $pid = pcntl_wait();
                /* $pids[$pid] tells me the type of search that just finished */
                /* If we need to merge results in the parent, we can do it here */
            }
            /* Now all children have exited, so the search is complete */
            return $results;
        }
        private function tag_search( $string ) {
            /* perform one type of search */
            return $results;
        }
        private function substring_search( $string ) {
            /* perform one type of search */
            return $results;
        }
        private function levenshtein_search( $string ) {
            /* perform one type of search */
            return $results;
        }
    }
?>

So will I need to use shmop_open before I call pcntl_fork to create shared memory and save the results there, or do the children share class variables? Or do they only share global variables? I'm sure the answer is easy... I just don't know it.

Answers (for anyone who finds this)

I've got a few more years of experience, so I'll try to impart some knowledge.

First, there are two important distinctions to understand when it comes to implementing multiprocessing in your applications:

  • Threads versus processes versus forked processes
  • Shared memory versus message passing

Threads, processes, forked processes

  • Threads: Threads are very low overhead since they run in the same process space as the parent and share the parent's memory address. This means fewer OS calls in order to create or destroy a thread. Threads are the "cheap" alternative if you plan to be creating and destroying them often. PHP does not have native support for threads. However as of PHP 7.2, there are PHP extensions (written in C) that provide threaded functionality. For example: pthreads
  • Processes: Processes have a much larger overhead because the operating system must allocate memory for it, and in the case of interpreted languages like PHP, there's often a whole runtime that must be loaded and processed before your own code executes. PHP does have native support for spawning processes via exec (synchronous) or proc_open (asynchronous)
  • Forked processes: A forked process splits the difference between these two approaches. A separate process is run in the current processes's memory space. There is also native support for this via PCNTL

Choosing the proper tool for the job often is a matter of asking the question: "How often will you be spinning up additional threads/processes"? If it's not that often (maybe you run a batch job every hour and the job can be parallelized) then processes might be the easier solution. If every request that comes into your server requires some form of parallel computation and you receive 100 requests per second, then threads are likely the way to go.

Shared memory, message passing

  • Shared memory: This is when more than one thread or process is allowed to write to the same section of RAM. This has the benefit of being very fast and easy to understand - it's like a shared whiteboard in an office space. Anyone can read or write to it. However it has several drawbacks when it comes to managing concurrency. Imagine if two processes write to the exact same place in memory at the exact same time, then a third process tries to read the result. Which result will it see? PHP has native support for shared memory via shmop, but to use it correctly requires locks, semaphores, monitors, or other complex systems engineering processes
  • Message passing: This is the "hot new thing"™ that has actually been around since the 70's. The idea is that instead of writing to shared memory, you write into your own memory space and then tell the other threads / processes "hey, I have a message for you". The Go programming language has a famous motto related to this: "Don't communicate by sharing memory, share memory by communicating". There are a multitude of ways to pass messages, including: writing to a file, writing to a socket, writing to stdout, writing to shared memory, etc.

A basic socket solution

First, I'll attempt to recreate my solution from 2012. @MarcB pointed me towards UNIX sockets. This page explicitly mentions fsockopen, which opens a socket as a file pointer. It also includes in the "See Also" section a link to socket_connect, which gives you a bit lower-level control over sockets.

At the time I likely spent a long time researching these socket_* functions until I got something working. Now I did a quick google search for socket_create_pair and found this helpful link to get you started

I've rewritten the code above writing the results to UNIX sockets, and reading the results into the parent thread:

<?php
/*
 * I retained the same public API as my original StackOverflow question,
 * but instead of performing actual searches I simply return static data
 */

class multithreaded_search {
    private $a, $b, $c;
    public function __construct($a, $b, $c) {
        $this->a = $a;
        $this->b = $b;
        $this->c = $c;
    }

    public function search( $string = '' ) {
        $search_types = array( 'tag', 'substring', 'levenshtein' );
        $pids = array();
        $threads = array();
        $sockets = array();
        foreach( $search_types as $type ) {
            /* Create a socket to write to later */
            $sockets[$type] = array();
            socket_create_pair(AF_UNIX, SOCK_STREAM, 0, $sockets[$type]);
            $pid = pcntl_fork();
            $pids[] = $pid;
            $threads[$pid] = $type;
            if( $pid == 0 ) { // child process
                /* no more confusion */
                $results = call_user_func( 'multithreaded_search::'.$type.'_search', $string );
                /* What do we do with $results ? Write them to a socket! */
                $data = serialize($results);
                socket_write($sockets[$type][0], str_pad($data, 1024), 1024);
                socket_close($sockets[$type][0]);
                exit();
            }
        }
        $results = [];
        for( $i = 0; $i < count( $pids ); $i++ ) {
            $pid = $pids[$i];
            $type = $threads[$pid];
            pcntl_waitpid($pid, $status);
            /* $threads[$pid] tells me the type of search that just finished */
            /* If we need to merge results in the parent, we can do it here */
            $one_result = unserialize(trim(socket_read($sockets[$type][1], 1024)));
            $results[] = $one_result;
            socket_close($sockets[$type][1]);
        }
        /* Now all children have exited, so the search is complete */
        return $results;
    }

    private function tag_search() {
        return $this->a;
    }

    private function substring_search() {
        return $this->b;
    }

    private function levenshtein_search() {
        return $this->c;
    }
}

$instance = new multithreaded_search(3, 5, 7);
var_dump($instance->search());

Notes

This solution uses forked processes and message passing over a local (in-memory) socket. Depending on your use case and setup, this may not be the best solution. For instance:

  • If you wish to split the processing among several separate servers and pass the results back to a central server, then create_socket_pair won't work. In this case you'll need to create a socket, bind the socket to an address and port, then call socket_listen to wait for results from the child servers. Furthermore, pcntl_fork wouldn't work in a multi-server environment since a process space can't be shared among different machines
  • If you're writing a command-line application and prefer to use threads, then you can either use pthreads or a third-party library that abstracts pthreads
  • If you don't like digging through the weeds and just want simple multiprocessing without having to worry about the implementation details, looks into a library like Amp/Parallel
like image 748
stevendesu Avatar asked Jan 03 '12 02:01

stevendesu


1 Answers

forked children will gain their own dedicated copy of their memory space as soon as they write anywhere to it - this is "copy-on-write". While shmop does provide access to a common memory location, the actual PHP variables and whatnot defined in the script are NOT shared between the children.

Doing $x = 7; in one child will not make the $x in the other children also become 7. Each child will have its own dedicated $x that is completely independent of everyone else's copy.

like image 56
Marc B Avatar answered Oct 20 '22 14:10

Marc B