Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I detect groups of common strings in filenames

Tags:

regex

php

I'm trying to figure out a way to detect groups of files. For instance:

If a given directory has the following files:

  • Birthday001.jpg
  • Birthday002.jpg
  • Birthday003.jpg
  • Picknic1.jpg
  • Picknic2.jpg
  • Afternoon.jpg.

I would like to condense the listing to something like

  • Birthday ( 3 pictures )
  • Picknic ( 2 pictures )
  • Afternoon ( 1 picture )

How should I go about detecting the groups?

like image 817
Ambirex Avatar asked Jul 26 '09 17:07

Ambirex


2 Answers

Here's one way you can solve this, which is more efficient than a brute force method.

  • load all the names into an associative array with key equal to the name and value equal to the name but with digits stripped (preg_replace('/\d//g', $key)).

You will have something like $arr1 = [Birthday001 => Birthday, Birthday002 => Birthday ...]

  • now make another associative array with keys that are values from the first array and value which is a count. Increment the count when you've already seen the key.
  • in the end you will end up with a 2nd array that contains the names and counts, just like you wanted. Something like $arr2 = [Birthday => 2, ...]
like image 152
Artem Russakovskii Avatar answered Oct 13 '22 10:10

Artem Russakovskii


Simply build a histogram whose keys are modified by a regex:

<?php

# input
$filenames = array("Birthday001.jpg", "Birthday002.jpg", "Birthday003.jpg", "Picknic1.jpg", "Picknic2.jpg", "Afternoon.jpg");

# create histogram
$histogram = array();
foreach ($filenames as $filename) {
    $name = preg_replace('/\d+\.[^.]*$/', '', $filename);
    if (isset($histogram[$name])) {
        $histogram[$name]++;
    } else {
        $histogram[$name] = 1;
    }
}

# output
foreach ($histogram as $name => $count) {
    if ($count == 1) {
        echo "$name ($count picture)\n";
    } else {
        echo "$name ($count pictures)\n";
    }
}

?>
like image 39
vog Avatar answered Oct 13 '22 11:10

vog