What is the most correct regular expression (regex) for a UNIX file path?
For example, to detect something like this:
/usr/lib/libgccpp.so.1.0.2
It's pretty easy to make a regular expression that will match most files, but what's the best one, including one that can detect escaped whitespace sequences, and unusual characters you don't usually find in file paths on UNIX.
Also, are there library functions in several different programming languages that provide a file path regex?
If you don't mind false positives for identifying paths, then you really just need to ensure the path doesn't contain a NUL
character; everything else is permitted (in particular, /
is the name-separator character). The better approach would be to resolve the given path using the appropriate file IO function (e.g. File.exists()
, File.getCanonicalFile()
in Java).
Long answer:
This is both operating system and file system dependent. For example, the Wikipedia comparison of file systems notes that besides the limits imposed by the file system,
MS-DOS, Microsoft Windows, and OS/2 disallow the characters
\ / : ? * " > < |
andNUL
in file and directory names across all filesystems. Unices and Linux disallow the characters/
andNUL
in file and directory names across all filesystems.
In Windows, the following reserved device names are also not permitted as filenames:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5,
COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4,
LPT5, LPT6, LPT7, LPT8, LPT9
The proper regular expression to match all UNIX paths is: [^\0]+
That is, one or more characters that are not a NUL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With