What exactly are the benefits of using a PHP 5 DirectoryIterator over PHP 4 "opendir/readdir/closedir"?

Question

What exactly are the benefits of using a PHP 5 DirectoryIterator

$dir = new DirectoryIterator(dirname(__FILE__));
foreach ($dir as $fileinfo) 
{
    // handle what has been found
}

over a PHP 4 "opendir/readdir/closedir"

if($handle = opendir(dirname(__FILE__))) 
{
    while (false !== ($file = readdir($handle))) 
    {
        // handle what has been found
    }
    closedir($handle);
}

besides the subclassing options that come with OOP?

Manu Manjunath · Accepted Answer

To understand the difference between the two, let's write two functions that read contents of a directory into an array - one using the procedural method and the other object oriented:

Procedural, using opendir/readdir/closedir

function list_directory_p($dirpath) {
    if (!is_dir($dirpath) || !is_readable($dirpath)) {
        error_log(__FUNCTION__ . ": Argument should be a path to valid, readable directory (" . var_export($dirpath, true) . " provided)");
        return null;
    }
    $paths = array();
    $dir = realpath($dirpath);
    $dh = opendir($dir);
    while (false !== ($f = readdir($dh))) {
        if ("$f" != '.' && "$f" != '..') {
            $paths[] = "$dir" . DIRECTORY_SEPARATOR . "$f";
        }
    }
    closedir($dh);
    return $paths;
}

Object Oriented, using DirectoryIterator

function list_directory_oo($dirpath) {
    if (!is_dir($dirpath) || !is_readable($dirpath)) {
        error_log(__FUNCTION__ . ": Argument should be a path to valid, readable directory (" . var_export($dirpath, true) . " provided)");
        return null;
    }
    $paths = array();
    $dir = realpath($dirpath);
    $di = new DirectoryIterator($dir);
    foreach ($di as $fileinfo) {
        if (!$fileinfo->isDot()) {
            $paths[] = $fileinfo->getRealPath();
        }
    }
    return $paths;
}

Performance

Let's assess their performance first:

$start_t = microtime(true);
for ($i = 0; $i < $num_iterations; $i++) {
    $paths = list_directory_oo(".");
}
$end_t = microtime(true);
$time_diff_micro = (($end_t - $start_t) * 1000000) / $num_iterations;
echo "Time taken per call (list_directory_oo) = " . round($time_diff_micro / 1000, 2) . "ms (" . count($paths) . " files)
";

$start_t = microtime(true);
for ($i = 0; $i < $num_iterations; $i++) {
    $paths = list_directory_p(".");
}
$end_t = microtime(true);
$time_diff_micro = (($end_t - $start_t) * 1000000) / $num_iterations;
echo "Time taken per call (list_directory_p) = " . round($time_diff_micro / 1000, 2) . "ms (" . count($paths) . " files)
";

On my laptop (Win 7 / NTFS), procedural method seems to be clear winner:

C:\code>"C:\Program Files (x86)\PHP\php.exe" list_directory.php
Time taken per call (list_directory_oo) = 4.46ms (161 files)
Time taken per call (list_directory_p) = 0.34ms (161 files)

On an entry-level AWS machine (CentOS):

[~]$ php list_directory.php
Time taken per call (list_directory_oo) = 0.84ms (203 files)
Time taken per call (list_directory_p) = 0.36ms (203 files)

Above are results on PHP 5.4. You'll see similar results using PHP 5.3 and 5.2. Results are similar when PHP is running on Apache or NGINX.

Code Readability

Although slower, code using DirectoryIterator is more readable.

File reading order

The order of directory contents read using either method are exact same. That is, if list_directory_oo returns array('h', 'a', 'g'), list_directory_p also returns array('h', 'a', 'g')

Extensibility

Above two functions demonstrated performance and readability. Note that, if your code needs to do further operations, code using DirectoryIterator is more extensible.

e.g. In function list_directory_oo above, the $fileinfo object provides you with a bunch of methods such as getMTime(), getOwner(), isReadable() etc (return values of most of which are cached and do not require system calls).

Therefore, depending on your use-case (that is, what you intend to do with each child element of the input directory), it's possible that code using DirectoryIterator performs as good or sometimes better than code using opendir.

You can modify the code of list_directory_oo and test it yourself.

Summary

Decision of which to use entirely depends on use-case.

If I were to write a cronjob in PHP which recursively scans a directory (and it's subdirectories) containing thousands of files and do certain operation on them, I would choose the procedural method.

But if my requirement is to write a sort of web-interface to display uploaded files (say in a CMS) and their metadata, I would choose DirectoryIterator.

You can choose based on your needs.

Levi Morrison · Answer

Benefit 1: You can hide away all the boring details.

When using iterators you generally define them somewhere else, so real-life code would look something more like:

// ImageFinder is an abstraction over an Iterator
$images = new ImageFinder($base_directory);
foreach ($images as $image) {
    // application logic goes here.
}

The specifics of iterating through directories, sub-directories and filtering out unwanted items are all hidden from the application. That's probably not the interesting part of your application anyway, so it's nice to be able to hide those bits away somewhere else.

Benefit 2: What you do with the result is separated from obtaining the result.

In the above example, you could swap out that specific iterator for another iterator and you don't have to change what you do with the result at all. This makes the code a bit easier to maintain and add new features to later on.

What exactly are the benefits of using a PHP 5 DirectoryIterator over PHP 4 "opendir/readdir/closedir"?

Tags:

php

e-sushi

2 Answers

Performance

Code Readability

File reading order

Extensibility

Summary

Manu Manjunath

Benefit 1: You can hide away all the boring details.

Benefit 2: What you do with the result is separated from obtaining the result.

Levi Morrison

Recent Activity

Donate For Us

What exactly are the benefits of using a PHP 5 DirectoryIterator over PHP 4 "opendir/readdir/closedir"?

Tags:

php

e-sushi

2 Answers

Performance

Code Readability

File reading order

Extensibility

Summary

Manu Manjunath

Benefit 1: You can hide away all the boring details.

Benefit 2: What you do with the result is separated from obtaining the result.

Levi Morrison

Related questions

Recent Activity

Donate For Us