Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP get path to every file in folder/subfolder into array? [duplicate]

Possible Duplicate:
PHP SPL RecursiveDirectoryIterator RecursiveIteratorIterator retrieving the full tree

I am not sure where to start. But I have to get the paths to all files in a folder and all the content of the subfolder in paths too. For example if I had 1 folder that had five folders and each folder had 10 mp3s in it etc... That means my array would have to find 50 paths to these files.

Later lets say I added one more folder and it had 3 folders in it and each folder had 10 images.

My code would now need to find 80 paths and store them in an array.

Does my question make sense?

UPDATE:

My desired out put would be to have all these paths stored in one array.

But I would "LOVE" the code to be dynamic, meaning if I later add 10 more folder and each having 17 subfolder and each folder having a multitude of different content. I would like the array to hold the file paths of all the files. I hppe this makes sense.

like image 316
Papa De Beau Avatar asked Sep 02 '12 06:09

Papa De Beau


3 Answers

What you are looking for is also called recursive directory traversing. Which means, you're going through all directories and list subdirectories and files in there. If there is a subdirectory it is traversed as well and so on and so forth - so it is recursive.

As you can imagine this is somewhat a common thing you need when you write a software and PHP supports you with that. It offers one RecursiveDirectoryIterator so that directories can be recursively iterated and the standard RecursiveIteratorIterator to do the traversal. You can then easily access all files and directories with a simple iteration, for example via foreach:

$rootpath = '.';
$fileinfos = new RecursiveIteratorIterator(
    new RecursiveDirectoryIterator($rootpath)
);
foreach($fileinfos as $pathname => $fileinfo) {
    if (!$fileinfo->isFile()) continue;
    var_dump($pathname);
}

This example first of all specifies the directory you want to traverse. I've been taking the current one:

$rootpath = '.';

The next line of code is a little bit long, it does instantiate the directory iterator and then the iterator-iterator so that the tree-like structure can be traversed in a single/flat loop:

$fileinfos = new RecursiveIteratorIterator(
    new RecursiveDirectoryIterator($rootpath)
);

These $fileinfos are then iterated with a simple foreach:

foreach($fileinfos as $pathname => $fileinfo) {

Inside of it, there is a test to skip all directories from being output. This is done by using the SplFileInfo object that is iterated over. It is provided by the recursive directory iterator and contains a lot of helpful properties and methods when working with files. You can as well for example return the file extension, the basename information about size and time and so on and so forth.

if (!$fileinfo->isFile()) continue;

Finally I just output the pathname that is the full path to the file:

var_dump($pathname);

An exemplary output would look like this (here on a windows operating system):

string(12) ".\.buildpath"
string(11) ".\.htaccess"
string(33) ".\dom\xml-attacks\attacks-xml.php"
string(38) ".\dom\xml-attacks\billion-laughs-2.xml"
string(36) ".\dom\xml-attacks\billion-laughs.xml"
string(40) ".\dom\xml-attacks\quadratic-blowup-2.xml"
string(40) ".\dom\xml-attacks\quadratic-blowup-3.xml"
string(38) ".\dom\xml-attacks\quadratic-blowup.xml"
string(22) ".\dom\xmltree-dump.php"
string(25) ".\dom\xpath-list-tags.php"
string(22) ".\dom\xpath-search.php"
string(27) ".\dom\xpath-text-search.php"
string(29) ".\encrypt-decrypt\decrypt.php"
string(29) ".\encrypt-decrypt\encrypt.php"
string(26) ".\encrypt-decrypt\test.php"
string(13) ".\favicon.ico"

If there is a subdirectory that is not accessible, the following would throw an exception. This behaviour can be controlled with some flags when instantiating the RecursiveIteratorIterator:

$fileinfos = new RecursiveIteratorIterator(
    new RecursiveDirectoryIterator('.'),
    RecursiveIteratorIterator::LEAVES_ONLY,
    RecursiveIteratorIterator::CATCH_GET_CHILD
);

I hope this was informative. You can also Wrap this up into a class of your own and you can also provide a FilterIterator to move the decision whether a file should be listed or not out of the foreach loop.


The power of the RecursiveDirectoryIterator and RecursiveIteratorIterator combination comes out of its flexibility. What was not covered above are so called FilterIterators. I thought I add another example that is making use of two self-written of them, placed into each other to combine them.

  • One is to filter out all files and directories that start with a dot (those are considered hidden files on UNIX systems so you should not give that information to the outside) and
  • Another one that is filtering the list to files only. That is the check that previously was inside the foreach.

Another change in this usage example is to make use of the getSubPathname() function that returns the subpath starting from the iteration's rootpath, so the one you're looking for.

Also I explicitly add the SKIP_DOTS flag which prevents traversing . and .. (technically not really necessary because the filters would filter those as well as they are directories, however I think it is more correct) and return as paths as UNIX_PATHS so the strings of paths are always unix-like paths regardless of the underlying operating system Which is normally a good idea if those values are requested via HTTP later as in your case:

$rootpath = '.';

$fileinfos = new RecursiveIteratorIterator(
    new FilesOnlyFilter(
        new VisibleOnlyFilter(
            new RecursiveDirectoryIterator(
                $rootpath,
                FilesystemIterator::SKIP_DOTS
                    | FilesystemIterator::UNIX_PATHS
            )
        )
    ),
    RecursiveIteratorIterator::LEAVES_ONLY,
    RecursiveIteratorIterator::CATCH_GET_CHILD
);

foreach ($fileinfos as $pathname => $fileinfo) {
    echo $fileinfos->getSubPathname(), "\n";
}

This example is similar to the previous one albeit how the $fileinfos is build is a little differently configured. Especially the part about the filters is new:

    new FilesOnlyFilter(
        new VisibleOnlyFilter(
            new RecursiveDirectoryIterator($rootpath, ...)
        )
    ),

So the directory iterator is put into a filter and the filter itself is put into another filter. The rest did not change.

The code for these filters is pretty straight forward, they work with the accept function that is either true or false which is to take or to filter out:

class VisibleOnlyFilter extends RecursiveFilterIterator
{
    public function accept()
    {
        $fileName = $this->getInnerIterator()->current()->getFileName();
        $firstChar = $fileName[0];
        return $firstChar !== '.';
    }
}

class FilesOnlyFilter extends RecursiveFilterIterator
{
    public function accept()
    {
        $iterator = $this->getInnerIterator();

        // allow traversal
        if ($iterator->hasChildren()) {
            return true;
        }

        // filter entries, only allow true files
        return $iterator->current()->isFile();
    }
}

And that's it again. Naturally you can use these filters for other cases, too. E.g. if you have another kind of directory listing.

And another exemplary output with the $rootpath cut away:

test.html
test.rss
tests/test-pad-2.php
tests/test-pad-3.php
tests/test-pad-4.php
tests/test-pad-5.php
tests/test-pad-6.php
tests/test-pad.php
TLD/PSL/C/dkim-regdom.c
TLD/PSL/C/dkim-regdom.h
TLD/PSL/C/Makefile
TLD/PSL/C/punycode.pl
TLD/PSL/C/test-dkim-regdom.c
TLD/PSL/C/test-dkim-regdom.sh
TLD/PSL/C/tld-canon.h
TLD/PSL/generateEffectiveTLDs.php

No more .git or .svn directory traversal or listing of files like .builtpath or .project.


Note for FilesOnlyFilter and LEAVES_ONLY: The filter explicitly denies the use of directories and links based on the SplFileInfo object (only regular files that do exist). So it is a real filtering based on the file-system.
Another method to only get non-directory entries ships with RecursiveIteratorIterator because of the default LEAVES_ONLY flag (here used too in the examples). This flag does not work as a filter and is independent to the underlying iterator. It just specifies that the iteration should not return branchs (here: directories in case of the directory iterator).

like image 188
hakre Avatar answered Nov 16 '22 21:11

hakre


If you are on linux and you don't mind executing a shell command, you can do this all in one line

$path = '/etc/php5/*'; // file filter, you could specify a extension using *.ext
$files = explode("\n", trim(`find -L $path`)); // -L follows symlinks

print_r($files);

Output:

Array (
       [0] => /etc/php5/apache2
       [1] => /etc/php5/apache2/php.ini
       [2] => /etc/php5/apache2/conf.d
       [3] => /etc/php5/apache2/conf.d/gd.ini
       [4] => /etc/php5/apache2/conf.d/curl.ini
       [5] => /etc/php5/apache2/conf.d/mcrypt.ini
       etc...
      )

The next shortest choice using only PHP is glob- but it doesn't scan sub-directories like you want. (you'd have to loop through the results, use is_dir() and then call your function again

http://us3.php.net/glob

$files = dir_scan('/etc/php5/*'); 
print_r($files);

function dir_scan($folder) {
    $files = glob($folder);
    foreach ($files as $f) {
        if (is_dir($f)) {
            $files = array_merge($files, dir_scan($f .'/*')); // scan subfolder
        }
    }
    return $files;
}

Every other way requires way more code then should be necessary to do something so simple

like image 20
msEmmaMays Avatar answered Nov 16 '22 21:11

msEmmaMays


Steps are as such:

and opendir will open the directory structure

$dh = opendir($dir)

what you do next is read whatever is there in $dh

$file = readdir($dh)

you can find all the info in the php manual corresponding to opendir

and googling for reading the structure returned this

http://www.codingforums.com/showthread.php?t=71882

like image 2
Tanmay Avatar answered Nov 16 '22 20:11

Tanmay