I have a directory of images, this could contain anywhere from 100 to many thousands of images. I need to take a sample of 81 random images out of this directory to be used (in an array).
I am currently using the following to grab an image
$locations = 'compressed/';
$images = glob($locations . '*', GLOB_BRACE);
$selected = $images[array_rand($images)];
The issue with this method is that it is possible to get the same image twice (albeit rarely in large samples)
I have also seen that opendir could be used then shuffling the array. Can someone please tell me which is more efficient to use? I would assume using shuffle then grabbing the first 81 elements would be better but slower for larger counts (as shuffling large arrays would take longer).
Any suggestions on the time complexity of my current setup as opposed to using opendir (or other methods I may not know of)?
Thanks
This is a really good question, I wish more of these would come up.
$start = microtime(true);
function recursiveDirectoryIterator($path) {
foreach(new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path)) as $file) {
if(!$file->isDir()) {
yield $file->getFilename() . $file->getExtension();
}
}
}
$instance = recursiveDirectoryIterator('../vendor');
$files = [];
foreach($instance as $value) {
$files[] = $value;
}
$total_files = count($files);
$random_array = [];
$total_randoms = 81;
for(;;){
$rand = random_int(0, $total_files);
if(count($random_array) == $total_randoms) {
break;
}
if(!isset($random_array[$rand])) {
$random_array[$rand] = $files[$rand];
}
}
echo "Mem peak usage: " . (memory_get_peak_usage(true)/1024/1024)." MiB" . '<br>';
echo "Total number of files: " . $total_files . '<br>';
echo "Completed in: ", microtime(true) - $start, " seconds" . '<br>';
echo '<pre>';
print_r($final);
die;
Output
Mem peak usage: 2 MiB
Total number of files: 12972
Completed in: 0.74663186073303 seconds
Array
(
[6118] => PreDec.phpphp
[4560] => LabelMaker.phpphp
[10360] => RecursiveDirectoryIterator.phpphp
[4124] => Enum.phpphp
[2671] => ImportCommand.phpphp
[1250] => WebDriverTest.phpphp
[10518] => AutoExpireFlashBagTest.phpphp
[6805] => zsdtPackTask.phpphp
[4288] => HTML.Trusted.txttxt
[6462] => border-disable.phptphpt
[4980] => main.ymlyml
[505] => StepTested.phpphp
[5219] => xhprof.ini.j2j2
[12959] => RequestInterface.phpphp
[1423] => xd5.phpphp
[4285] => HTML.TidyAdd.txttxt
[4930] => .travis.ymlyml
[12013] => Defined.phpphp
[8779] => Markdown.phpphp
[5979] => pt.phpphp
[278] => AbstractAdapter.phpphp
[2155] => SemVerTest.phpphp
[523] => ServicesResolverFactory.phpphp
[11686] => AbstractDumper.phpphp
[7320] => Functions.phpphp
[7763] => mocked_clone.tpl.distdist
[11541] => test_landscape.gifgif
[3557] => RegionSelectorSpec.phpphp
[2600] => RoutingAccessSniff.phpphp
[9496] => LoaderTest.phpphp
[4958] => setup-RedHat.ymlyml
[3477] => api.featurefeature
[7975] => WtfCommand.phpphp
[9001] => ElseIfDeclarationSniff.phpphp
[11696] => VarDumperTestTrait.phpphp
[11211] => empty.ymlyml
[10925] => ObjectRouteLoader.phpphp
[10936] => MatcherDumperInterface.phpphp
[2685] => ConnectCommand.phpphp
[9066] => EmptyStyleDefinitionSniff.phpphp
[3536] => BehatTestExtensionInstallStorage.phpphp
[4720] => ansible-args.mdmd
[326] => ZipOutputParser.phpphp
[9565] => BufferedOutput.phpphp
[712] => CliExtension.phpphp
[3436] => .travis.ymlyml
[4471] => HTMLPurifier.kses.phpphp
[2764] => RouteSubscriberCommand.phpphp
[10633] => RoutableFragmentRenderer.phpphp
[6906] => Reference.phpphp
[11663] => DoctrineCaster.phpphp
[8042] => GitHubChecker.phpphp
[1466] => ImageDriverInterface.phpphp
[2652] => DrupalCommand.phpphp
[7265] => classUsesNamespacedFunction.phpphp
[12129] => ExtensionInterface.phpphp
[12184] => ConditionalExpression.phpphp
[12128] => EscaperExtension.phpphp
[6678] => JsHintTask.phpphp
[5351] => main.ymlyml
[2104] => _bootstrap.phpphp
[143] => deploy_branch
[1360] => x8f.phpphp
[4713] => composer-dependency.mdmd
[7495] => ExceptionInAssertPostConditionsTest.phpphp
[4508] => info.txttxt
[8369] => 6.1.3-curl-adapter.phpphp
[3093] => create-data.ymlyml
[1882] => .gitkeepgitkeep
[3747] => example.makemake
[507] => EventDispatchingBackgroundTester.phpphp
[3336] => shell.ymlyml
[397] => AnnotationReader.phpphp
[4005] => xhUnitTest.phpphp
[5168] => test.ymlyml
[10909] => MissingMandatoryParametersException.phpphp
[8686] => FacetSetTest.phpphp
[2321] => FileCache.phpphp
[10538] => StreamedResponseTest.phpphp
[12572] => in.testtest
[7031] => StringContainsToken.phpphp
)
Code break down.
I used RecursiveDirectoryIterator with a Generator to save up on memory usage.
Next, instead of shuffling a huge array, I chose another approach: generate 81 random, non repeating numbers in the range of max count of the files array and 0. Once you have the random numbers simply use array_intersect_key which is fairly fast.
Do note a logical pitfall which I didn't take into account:
for loop will run forever.Final note: I'm absolutely sure somebody smarter than me can think of something better, but for now this will work.
Also, since I'm using PHP 7.x I have the advantage of opcache and performance will be a better on my part, your results may vary.
Please note that if the number of files is very small the for loop will run for longer since the change of collision is higher on smaller samples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With