Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is micro-optimization worth the time?

I am a PHP developer and I have always thought that micro-optimizations are not worth the time. If you really need that extra performance, you would either write your software so that it's architecturally faster, or you write a C++ extension to handle slow tasks (or better yet, compile the code using HipHop). However, today a work mate told me that there is a big difference in

is_array($array) 

and

$array === (array) $array 

and I was like "eh, that's a pointless comparison really", but he wouldn't agree with me.. and he is the best developer in our company and is taking charge of a website that does about 50 million SQL queries per day -- for instance. So, I am wondering here: could he be wrong or is micro-optimization really worth the time and when?

like image 549
Tower Avatar asked Aug 12 '10 18:08

Tower


2 Answers

Well, for a trivially small array, $array === (array) $array is significantly faster than is_array($array). On the order of over 7 times faster. But each call is only on the order of 1.0 x 10 ^ -6 seconds (0.000001 seconds). So unless you're calling it literally thousands of times, it's not going to be worth it. And if you are calling it thousands of times, I'd suggest you're doing something wrong...

The difference comes when you're dealing with a large array. Since $array === (array) $array requires a new variable to be copied requires the array to be iterated over internally for the comparison, it'll likely be SIGNIFICANTLY slower for a large array. For example, on an array with 100 integer elements, is_array($array) is within a margin of error (< 2%) of is_array() with a small array (coming in at 0.0909 seconds for 10,000 iterations). But $array = (array) $array is extremely slow. For only 100 elements, it's already over twice as slow as is_array() (coming in at 0.203 seconds). For 1000 elements, is_array stayed the same, yet the cast comparison increased to 2.0699 seconds...

The reason it's faster for small arrays is that is_array() has the overhead of being a function call, where the cast operation is a simple language construct... And iterating over a small variable (in C code) will typically be cheaper than the function call overhead. But, for larger variables, the difference grows...

It's a tradeoff. If the array is small enough, iteration will be more efficient. But as the size of the array grows, it will becomes increasingly slower (and hence the function call will become faster).

Another Way To Look At It

Another way to look at it would be to examine the algorithmic complexity of each cast.

Let's take a look at is_array() first. It's source code basically shows it's an O(1) operation. Meaning it's a constant time operation. But we also need to look at the function call. In PHP, function calls with a single array parameter are either O(1) or O(n) depending on if copy-on-write needs to be triggered. If you call is_array($array) when $array is a variable reference, copy-on-write will be triggered and a full copy of the variable will occur.

So therefore is_array() is a best case O(1) and worst-case O(n). But as long as you're not using references, it's always O(1)...

The cast version, on the other hand, does two operations. It does a cast, then it does an equality check. So let's look at each separately. The cast operator handler first forces a copy of the input variable. No matter if it's a reference or not. So simply using the (array) casting operator forces an O(n) iteration over the array to cast it (via the copy_ctor call).

Then, it converts the new copy to an array. This is O(1) for arrays and primitives, but O(n) for objects.

Then, the identical operator executes. The handler is just a proxy to the is_identical_function(). Now, is_identical will short-circuit if $array is not an array. Therefore, it has a best case of O(1). But if $array is an array, it can short-circuit again if the hash tables are identical (meaning both variables are copy-on-write copies of each other). So that case is O(1) as well. But remember that we forced a copy above, so we can't do that if it's an array. So it's O(n) thanks to zend_hash_compare...

So the end result is this table of worst-case runtime:

+----------+-------+-----------+-----------+---------------+ |          | array | array+ref | non-array | non-array+ref | +----------+-------+-----------+-----------+---------------+ | is_array |  O(1) |    O(n)   |    O(1)   |     O(n)      | +----------+-------+-----------+-----------+---------------+ | (array)  |  O(n) |    O(n)   |    O(n)   |     O(n)      | +----------+-------+-----------+-----------+---------------+ 

Note that it looks like they scale the same for references. They don't. They both scale linearly for referenced variables. But the constant factor changes. For example, in a referenced array of size 5, is_array will perform 5 memory allocations, and 5 memory copies, followed by 1 type check. The cast version, on the other hand, will perform 5 memory allocations, 5 memory copies, followed by 2 type checks, followed by 5 type checks and 5 equality checks (memcmp() or the like). So n=5 yields 11 ops for is_array, yet 22 ops for ===(array)...

Now, is_array() does have the O(1) overhead of a stack push (due to the function call), but that'll only dominate runtime for extremely small values of n (we saw in the benchmark above just 10 array elements was enough to completely eliminate all difference).

The Bottom Line

I'd suggest going for readability though. I find is_array($array) to be far more readable than $array === (array) $array. So you get the best of both worlds.

The script I used for the benchmark:

$elements = 1000; $iterations = 10000;  $array = array(); for ($i = 0; $i < $elements; $i++) $array[] = $i;  $s = microtime(true); for ($i = 0; $i < $iterations; $i++) is_array($array); $e = microtime(true); echo "is_array completed in " . ($e - $s) ." Seconds\n";  $s = microtime(true); for ($i = 0; $i < $iterations; $i++) $array === (array) $array; $e = microtime(true); echo "Cast completed in " . ($e - $s) ." Seconds\n"; 

Edit: For the record, these results were with 5.3.2 on Linux...

Edit2: Fixed the reason the array is slower (it's due to the iterated comparison instead of memory reasons). See compare_function for the iteration code...

like image 68
ircmaxell Avatar answered Oct 08 '22 18:10

ircmaxell


Micro-optimisation is worth it when you have evidence that you're optimising a bottleneck.

Usually it's not worth it - write the most readable code you can, and use realistic benchmarks to check the performance. If and when you find you've got a bottleneck, micro-optimise just that bit of code (measuring as you go). Sometimes a small amount of micro-optimisation can make a huge difference.

But don't micro-optimise all your code... it will end up being far harder to maintain, and you'll quite possibly find you've either missed the real bottleneck, or that your micro-optimisations are harming performance instead of helping.

like image 44
Jon Skeet Avatar answered Oct 08 '22 19:10

Jon Skeet