Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursively search all directories for an array of strings in php

I am new to PHP coding and here am looking for fastest way to do recursive search on all directories for an array of strings.

I am doing this way

$contents_list = array("xyz","abc","hello"); // this list can grow any size
$path = "/tmp/"; //user will give any path which can contain multi level sub directories

$dir = new RecursiveDirectoryIterator($path);

foreach(new RecursiveIteratorIterator($dir) as $filename => $file) {
    $fd = fopen($file,'r');
    if($fd) {
        while(!feof($fd)) {
            $line = fgets($fd);
            foreach($contents_list as $content) {
                if(strpos($line, $content) != false) {
                    echo $line."\n";
                }
            }         
        }
    }
    fclose($fd);
}

Here I am recursively iterating over all directories and then again on each file iterate over contents array to search.

Is there any better way to do kind of search ? Please suggest for faster alternative.

Thanks

like image 818
inari6 Avatar asked Nov 14 '13 06:11

inari6


People also ask

What is the use of recursive directory iterator in PHP?

RecursiveDirectoryIterator::hasChildren — Returns whether current entry is a directory and not '.' or '..' RecursiveDirectoryIterator::key — Return path and filename of current dir entry RecursiveDirectoryIterator::rewind — Rewind dir back to the start $Regex will contain a single index array for each PHP file.

How do I search in all the files in a directory?

It only searches in all the files in the current directory. It won't search in the subdirectories. You can make grep search in all the files and all the subdirectories of the current directory using the -r recursive search option: grep -r search_term .

How do I do a recursive grep search in Linux?

Grep provides a -r option for the recursive search. With this option, grep will look into all the files in the current (or specified) directory and it will also look into all the files of all the subdirectories. Here's the recursive search I performed in the previous example to do a grep search in the current folder: grep -r simple .

How to search for the word 'simple' in a directory?

To search for the word 'simple' in all the files of the current directories, just use wild card (*). The wild card actually substitutes with the name of all the files and directories in the current directory. This will search in all the files in the current directories, but it won't enter the subdirectories.


1 Answers

If you're allowed to execute shell commands in your environment (and assuming you're running your script on *nix), you could call the native grep command recursively. That would give you the fastest results.

$contents_list = array("xyz","abc","hello");
$path = "/tmp/";
$pattern = implode('\|', $contents_list) ;
$command = "grep -r '$pattern' $path";
$output = array();
exec($command, $output);
foreach ($output as $match) {
    echo $match . '\n';
}

If the disable_functions directive is in effect and you can't call grep, you could use your approach with RecursiveDirectoryIterator and reading the files line by line, using strpos on each line. Please note that strpos requires a strict equality check (use !== false instead of != false), otherwise you'll skip matches at the beginning of a line.

A slightly faster way is to use glob recusively to obtain a list of files, and read those files at once instead of scanning them line by line. According to my tests, this approach will give you about 30-35% time advantage over yours.

function recursiveDirList($dir, $prefix = '') {
    $dir = rtrim($dir, '/');
    $result = array();

    foreach (glob("$dir/*", GLOB_MARK) as &$f) {
        if (substr($f, -1) === '/') {
            $result = array_merge($result, recursiveDirList($f, $prefix . basename($f) . '/'));
        } else {
            $result[] = $prefix . basename($f);
        }
    }

    return $result;
}

$files = recursiveDirList($path);
foreach ($files as $filename) {

    $file_content = file($path . '/' . $filename);
    foreach ($file_content as $line) {
        foreach($contents_list as $content) {
            if(strpos($line, $content) !== false) {
                echo $line . '\n';
            }
        }
    }
}

Credit for the recursive glob function goes to http://proger.i-forge.net/3_ways_to_recursively_list_all_files_in_a_directory/Opc

To sum it up, performance-wise you have the following rankings (results in seconds for a farly large directory containing ~1200 files recusively, using two common text patterns):

  1. call grep via exec() - 2.2015s
  2. use recursive glob and read files with file() - 9.4443s
  3. use RecursiveDirectoryIterator and read files with readline() - 15.1183s
like image 191
András Szepesházi Avatar answered Sep 30 '22 19:09

András Szepesházi