Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling a function with explicit parameters vs. call_user_func_array()

Tags:

php

I saw a piece of code earlier this week (which, unfortunately, I am unable to retrieve) and I am curious about the way the author went about implementing the __call() magic method. The code looked something like the following:

class Sample
{
    protected function test()
    {
        var_dump(func_get_args());
    }
    public function __call($func, $args)
    {
        if(!method_exists($this, $func))
        {
            return null;
        }
        switch(count($args))
        {
            case 0:
                return $this->$func();
            case 1:
                return $this->$func($args[0]);
            case 2:
                return $this->$func($args[0], $args[1]);
            case 3:
                return $this->$func($args[0], $args[1], $args[2]);
            case 4:
                return $this->$func($args[0], $args[1], $args[2], $args[3]);
            case 5:
                return $this->$func($args[0], $args[1], $args[2], $args[3], $args[4]);
            default:
                return call_user_func_array($this->$func, $args);
        }
    }
}
$obj = new Sample();
$obj->test("Hello World"); // Would be called via switch label 1

As you can see, the author could have just used call_user_func_array() and dropped the switch entirely, so that would lead to me to believe there was (hopefully) some intelligent reasoning behind this.

The only reason I could think of would be perhaps some overhead of the function call to call_user_func_array(), but that doesn't seem like a good enough reason to use a bunch of case statements. Is there an angle here I don't seem to be getting?

like image 411
Tim Cooper Avatar asked Dec 01 '11 15:12

Tim Cooper


2 Answers

The reason is that there is overhead on call_user_func_array. It has the overhead of an additional function call. Typically this is in the range of microseconds, but it can become important in two cases:

  1. Recursive Function Calls

    Since it's adding another call to the stack, it will double the amount of stack usage. So you can run into issues (with xdebug, or memory constraints) which will cause your application to crash if you run out of stack. In applications (or parts), using this style approach can reduce your stack usage by as much as 33% (which can be the difference between an application running and crashing)

  2. Performance

    If you're calling the function a lot, then those microseconds can add up significantly. Since this is in a framework (It looks like something done by Lithium), it will likely be called tens, hundreds or even thousands of times in the lifetime of the application. So, even though each individual call is a micro-optimization, the effect adds up significantly.

So yes, you can remove the switch and replace it with call_user_func_array and it will be 100% the same with respect to functionality. But you'll loose the two optimization benefits mentioned above.

EDIT And to prove the performance difference:

I decided to do a benchmark myself. Here's a link to the exact source that I used:

http://codepad.viper-7.com/s32CSb (also included at the bottom of this answer for reference)

Now, I tested it on a Linux system, a windows system and codepad's site (2 command line, and 1 online, and 1 with XDebug enabled) All running 5.3.6 or 5.3.8

Conclusion

Since the results are rather long, I'll summarize first.

If you're calling this a lot, it's not a micro-optimization to do this. Sure, an individual call is insignificant difference. But if it's going to be used a lot, it can save quite a bit of time.

Now, it's worth noting that all except one of these tests are run with XDebug off. This is extremely important, as xdebug appears to significantly alter the results of the benchmark.

Here are the raw results:

Linux

With 0 Arguments:
test1 in 0.0898239612579 Seconds
test2 in 0.0540208816528 Seconds
testObj1 in 0.118539094925 Seconds
testObj2 in 0.0492739677429 Seconds

With 1 Arguments:
test1 in 0.0997269153595 Seconds
test2 in 0.053689956665 Seconds
testObj1 in 0.137704849243 Seconds
testObj2 in 0.0436580181122 Seconds

With 2 Arguments:
test1 in 0.0883569717407 Seconds
test2 in 0.0551269054413 Seconds
testObj1 in 0.115921974182 Seconds
testObj2 in 0.0550417900085 Seconds

With 3 Arguments:
test1 in 0.0809321403503 Seconds
test2 in 0.0630970001221 Seconds
testObj1 in 0.124716043472 Seconds
testObj2 in 0.0640230178833 Seconds

With 4 Arguments:
test1 in 0.0859131813049 Seconds
test2 in 0.0723040103912 Seconds
testObj1 in 0.137611865997 Seconds
testObj2 in 0.0707349777222 Seconds

With 5 Arguments:
test1 in 0.109707832336 Seconds
test2 in 0.122457027435 Seconds
testObj1 in 0.201376914978 Seconds
testObj2 in 0.217674016953 Seconds

(I actually ran it about a dozen times, and the results are consistent). So, you can clearly see that on that system, it's significantly faster to use the switch for functions with 3 or less arguments. For 4 arguments, it's close enough to qualify as a micro-optimization. For 5 it's slower (due to the overhead of the switch statement).

Now, objects are another story. For objects, it's significantly faster to use the switch statement even with 4 arguments. And the 5 argument is slightly slower.

Windows

With 0 Arguments:
test1 in 0.078088998794556 Seconds
test2 in 0.040416955947876 Seconds
testObj1 in 0.092448949813843 Seconds
testObj2 in 0.044382095336914 Seconds

With 1 Arguments:
test1 in 0.084033012390137 Seconds
test2 in 0.049020051956177 Seconds
testObj1 in 0.098193168640137 Seconds
testObj2 in 0.055608987808228 Seconds

With 2 Arguments:
test1 in 0.092596054077148 Seconds
test2 in 0.059282064437866 Seconds
testObj1 in 0.10753011703491 Seconds
testObj2 in 0.06486701965332 Seconds

With 3 Arguments:
test1 in 0.10003399848938 Seconds
test2 in 0.073707103729248 Seconds
testObj1 in 0.11481595039368 Seconds
testObj2 in 0.072822093963623 Seconds

With 4 Arguments:
test1 in 0.10518193244934 Seconds
test2 in 0.076627969741821 Seconds
testObj1 in 0.1221661567688 Seconds
testObj2 in 0.080114841461182 Seconds

With 5 Arguments:
test1 in 0.11016392707825 Seconds
test2 in 0.14898705482483 Seconds
testObj1 in 0.13080286979675 Seconds
testObj2 in 0.15970706939697 Seconds

Again, just as with Linux, it's faster for every case except 5 arguments (which is expected). So nothing out of the normal here.

Codepad

With 0 Arguments:
test1 in 0.094165086746216 Seconds
test2 in 0.046183824539185 Seconds
testObj1 in 0.088129043579102 Seconds
testObj2 in 0.046132802963257 Seconds

With 1 Arguments:
test1 in 0.093621969223022 Seconds
test2 in 0.054486036300659 Seconds
testObj1 in 0.11912703514099 Seconds
testObj2 in 0.053775072097778 Seconds

With 2 Arguments:
test1 in 0.099776029586792 Seconds
test2 in 0.072152853012085 Seconds
testObj1 in 0.10576200485229 Seconds
testObj2 in 0.065294027328491 Seconds

With 3 Arguments:
test1 in 0.11053204536438 Seconds
test2 in 0.088426113128662 Seconds
testObj1 in 0.11045718193054 Seconds
testObj2 in 0.073081970214844 Seconds

With 4 Arguments:
test1 in 0.11662006378174 Seconds
test2 in 0.085783958435059 Seconds
testObj1 in 0.11683893203735 Seconds
testObj2 in 0.081549882888794 Seconds

With 5 Arguments:
test1 in 0.12763905525208 Seconds
test2 in 0.15642619132996 Seconds
testObj1 in 0.12538290023804 Seconds
testObj2 in 0.16010403633118 Seconds

This shows the same picture as with Linux. With 4 arguments or less, it's significantly faster to run it through the switch. With 5 arguments, it is significantly slower with the switch.

Windows With XDebug

With 0 Arguments:
test1 in 0.31674790382385 Seconds
test2 in 0.31161189079285 Seconds
testObj1 in 0.40747404098511 Seconds
testObj2 in 0.32526516914368 Seconds

With 1 Arguments:
test1 in 0.32827591896057 Seconds
test2 in 0.33025598526001 Seconds
testObj1 in 0.38013815879822 Seconds
testObj2 in 0.3494348526001 Seconds

With 2 Arguments:
test1 in 0.33168315887451 Seconds
test2 in 0.35207295417786 Seconds
testObj1 in 0.37523794174194 Seconds
testObj2 in 0.38242697715759 Seconds

With 3 Arguments:
test1 in 0.33901619911194 Seconds
test2 in 0.36867690086365 Seconds
testObj1 in 0.41470503807068 Seconds
testObj2 in 0.3860080242157 Seconds

With 4 Arguments:
test1 in 0.35170817375183 Seconds
test2 in 0.39288783073425 Seconds
testObj1 in 0.39424705505371 Seconds
testObj2 in 0.39747595787048 Seconds

With 5 Arguments:
test1 in 0.37077689170837 Seconds
test2 in 0.59246301651001 Seconds
testObj1 in 0.41220307350159 Seconds
testObj2 in 0.60260510444641 Seconds

Now this tells a different story. In this case with XDebug enabled (but no coverage analysis, just the extension turned on), it's almost always slower to use the switch optimization. This is curious since many benchmarks are run on dev boxes with xdebug enabled. Yet production boxes usually don't run with xdebug. So it's a pure lesson in executing benchmarks in proper environments.

Source

<?php

function benchmark($callback, $iterations, $args) {
    $st = microtime(true);
    $callback($iterations, $args);
    $et = microtime(true);
    $time = $et - $st;
    return $time;
}

function test() {

}

function test1($iterations, $args) {
    $func = 'test';
    for ($i = 0; $i < $iterations; $i++) {
        call_user_func_array($func, $args);
    }
}

function test2($iterations, $args) {
    $func = 'test';
    for ($i = 0; $i < $iterations; $i++) {
        switch (count($args)) {
            case 0:
                $func();
                break;
            case 1:
                $func($args[0]);
                break;
            case 2:
                $func($args[0], $args[1]);
                break;
            case 3:
                $func($args[0], $args[1], $args[2]);
                break;
            case 4:
                $func($args[0], $args[1], $args[2], $args[3]);
                break;
            default:
                call_user_func_array($func, $args);
        }
    }
}

class Testing {

    public function test() {

    }

    public function test1($iterations, $args) {
        for ($i = 0; $i < $iterations; $i++) {
            call_user_func_array(array($this, 'test'), $args);
        }
    }

    public function test2($iterations, $args) {
        $func = 'test';
        for ($i = 0; $i < $iterations; $i++) {
            switch (count($args)) {
                case 0:
                    $this->$func();
                    break;
                case 1:
                    $this->$func($args[0]);
                    break;
                case 2:
                    $this->$func($args[0], $args[1]);
                    break;
                case 3:
                    $this->$func($args[0], $args[1], $args[2]);
                    break;
                case 4:
                    $this->$func($args[0], $args[1], $args[2], $args[3]);
                    break;
                default:
                    call_user_func_array(array($this, $func), $args);
            }
        }
    }

}

function testObj1($iterations, $args) {
    $obj = new Testing;
    $obj->test1($iterations, $args);
}

function testObj2($iterations, $args) {
    $obj = new Testing;
    $obj->test2($iterations, $args);
}

$iterations = 100000;

$results = array('test1' => array(), 'test2' => array(), 'testObj1' => array(), 'testObj2' => array());
foreach ($results as $callback => &$result) {
    $args = array();
    for ($i = 0; $i < 6; $i++) {
        $result[$i] = benchmark($callback, $iterations, $args);
        $args[] = 'abcdefghijklmnopqrstuvwxyz';
    }
}
unset($result);
$merged = array(0 => array(), 1 => array(), 2 => array(), 3 => array(), 4 => array());

foreach ($results as $callback => $result) {
    foreach ($result as $args => $time) {
        $merged[$args][$callback] = $time;
    }
}

foreach ($merged as $args => $matrix) {
    echo "With $args Arguments:<br />";
    foreach ($matrix as $callback => $time) {
        echo "$callback in $time Seconds<br />";
    }
    echo "<br />";
}
like image 105
ircmaxell Avatar answered Oct 30 '22 12:10

ircmaxell


You can find this in the phpsavant template classes. PMJ got a tip about how slow call_user_func*() and figured that 90% of the work would be handled by the first five params much faster. Anything else would be handled the slow way. I can't find the post with the discussion about how, but this is the page where he identifies the problem. http://paul-m-jones.com/archives/182

like image 39
Adrian Avatar answered Oct 30 '22 12:10

Adrian