Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

file_exists() versus in_array() of scandir() -- which is faster? [closed]

Tags:

php

Let's say we have a loop like this:

foreach($entries as $entry){ // let's say this loops 1000 times
   if (file_exists('/some/dir/'.$entry.'.jpg')){
      echo 'file exists';
   }
}

I assume this has to access the HDD 1000 times and check if each file exists.

What about doing this instead?

$files = scandir('/some/dir/');
foreach($entries as $entry){ // let's say this loops 1000 times
   if (in_array($entry.'.jpg', $files)){
      echo 'file exists';
   }
}

Question 1: If this accesses the HDD once, then I believe it should be a lot faster. Am I right on this one?

However, what if I have to check sub-directories for a file, like this:

foreach($entries as $entry){ // let's say this loops 1000 times
   if (file_exists('/some/dir/'.$entry['id'].'/'.$entry['name'].'.jpg')){
      echo 'file exists';
   }
}

Question 2: If I want to apply the above technique (files in array) to check if the entries exist, how can I scandir() sub-directories into the array, so that I can compare the file existence using this method?

like image 901
Frantisek Avatar asked Jan 25 '13 07:01

Frantisek


1 Answers

Im my opinion, I believe the scandir() will be faster as it only reads the directory once, in addition file_exists() is known to be quite slow.

Furthermore, you could use glob(). This will list all files in a directory that match a particular pattern. See here

Regardless of my opinion, you can run a simple script like so to test the speed:

<?php

// Get the start time
$time_start = microtime(true);

// Do the glob() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'glob()\' finished in ' . $time . 'seconds';

// Do the file_exists() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'file_exists()\' finished in ' . $time . 'seconds';

// Do the scandir() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'scandir()\' finished in ' . $time . 'seconds';

?>

Not sure how the above script will behave with the cache, you may have to separate the tests into separate files and run individually

Update 1

You could also implement the function memory_get_usage() to return the amount of memory currently allocated to the PHP script. You may find this useful. See here for more details.

Update 2

As for your second question, there are several ways you can list all files in a directory, including sub-directories. See the answers to this question:

Scan files in a directory and sub-directory and store their path in array using php

like image 96
Ben Carey Avatar answered Nov 13 '22 02:11

Ben Carey