Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most correct regular expression for a UNIX file path?

Tags:

regex

path

What is the most correct regular expression (regex) for a UNIX file path?

For example, to detect something like this:

/usr/lib/libgccpp.so.1.0.2

It's pretty easy to make a regular expression that will match most files, but what's the best one, including one that can detect escaped whitespace sequences, and unusual characters you don't usually find in file paths on UNIX.

Also, are there library functions in several different programming languages that provide a file path regex?

like image 960
Neil Avatar asked Feb 11 '09 16:02

Neil


2 Answers

If you don't mind false positives for identifying paths, then you really just need to ensure the path doesn't contain a NUL character; everything else is permitted (in particular, / is the name-separator character). The better approach would be to resolve the given path using the appropriate file IO function (e.g. File.exists(), File.getCanonicalFile() in Java).

Long answer:

This is both operating system and file system dependent. For example, the Wikipedia comparison of file systems notes that besides the limits imposed by the file system,

MS-DOS, Microsoft Windows, and OS/2 disallow the characters \ / : ? * " > < | and NUL in file and directory names across all filesystems. Unices and Linux disallow the characters / and NUL in file and directory names across all filesystems.

In Windows, the following reserved device names are also not permitted as filenames:

CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5,
COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, 
LPT5, LPT6, LPT7, LPT8, LPT9
like image 197
Zach Scrivena Avatar answered Sep 28 '22 17:09

Zach Scrivena


The proper regular expression to match all UNIX paths is: [^\0]+

That is, one or more characters that are not a NUL.

like image 38
Darron Avatar answered Sep 28 '22 18:09

Darron