Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx pattern in glob function

I recive a filename in a function. I want to return all files similar to this file (by filename) from other directory. I wrote this:

    $thumbDir = $this->files_path.'thumbs/';
    $toglob = $thumbDir.pathinfo($name, PATHINFO_FILENAME ).'_[0-9]+\x[0-9]+_thb.'.pathinfo($name, PATHINFO_EXTENSION);
    foreach (glob($toglob) as $key => $value) {
        echo $value;
    }

But it doesn't work. I search files which their filename is:

oldFileName_[one or more digits]x[one or more digits]_thb.oldFileNameExtension

I will be very grateful if someone help me with this :)

like image 713
user3025978 Avatar asked May 09 '14 13:05

user3025978


People also ask

What is the difference between glob and Re in Python?

The main difference is that the regex pattern matches strings in code, while globbing matches file names or file content in the terminal. Globbing is the shell's way of providing regular expression patterns like other programming languages.

What does glob glob do in Python?

glob (short for global) is used to return all file paths that match a specific pattern. We can use glob to search for a specific file pattern, or perhaps more usefully, search for files where the filename matches a certain pattern by using wildcard characters.

What is a glob search?

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but * , ? , and character ranges expressed with [] will be correctly matched. This is done by using the os.


1 Answers

glob() is really a quasi-regex engine. From a comment on the docs, it allows a ? and a *:

glob uses two special symbols that act like sort of a blend between a meta-character and a quantifier. These two characters are the * and ?

The ? matches 1 of any character except a /

The * matches 0 or more of any character except a /

If it helps, think of the * as the pcre equivalent of .* and ? as the pcre equivalent of the dot (.)

This means you can't use your expression _[0-9]+\x[0-9]+_thb. in glob(). Instead, you can look in the whole directory and see if it matches with preg_match():

$glob = glob('/path/to/dir/*');
foreach($glob as $file) {
    if(preg_match('/_\d+x\d+_thb\./', $file)) {
        // Valid match
        echo $file;
    }
}

Realize that in glob(/path/to/dir/*);, the * does not match a / so this will not get any files in subdirectories. It will only loop through every file and directory in that path; if you want to go deeper, you will have to make a recursive function.

Note I cleaned your expression:

_\d+x\d+_thb\.

\d roughly equals [0-9] (it also includes Arabic digit characters, etc.), you do not need to escape x (so no \x), and you want to escape the period (\.).

like image 122
Sam Avatar answered Oct 16 '22 15:10

Sam