While answering this question regarding safe escaping of filename with spaces (and potentially other characters), one of the answers said to use Perl's built-in quotemeta function.
The documentation of quotemeta states:
quotemeta (and \Q ... \E ) are useful when interpolating strings
into regular expressions, because by default an interpolated variable
will be considered a mini-regular expression.
In the documentation for quotemeta, the only mention of its use is to escape all the characters other than /[A-Za-z_0-9]/
with a \
for use in a regex. It does not state the use for filenames. This does seem like a very pleasant, if undocumented, side effect however.
In a comment to Sinan Ünür answer to the earlier question, hobbs states:
shell escaping is different from regexp escaping, and although I can't come up with a situation where quotemeta would give a truly unsafe result, it's not meant for the task. If you must escape, instead of bypassing the shell, I suggest trying String::ShellQuote which takes a more conservative approach using sh single quotes to defang everything except single quotes themselves, and backslashes for single quotes. – hobbs Aug 13 '09 at 14:25
Is it safe -- completely -- to use quotemeta in place of more conservative file quoting like String::Shellquote? Is quotemeta utf8 or multibyte character safe?
I put together a test that is unclear. quotemeta works well, it seems, except for a file name or directory name with a \n
, or \r
in it. While rare, these characters are legal in Unix and I have seen them. Recall that certain characters, such as LF, CR and NUL cannot be escaped with \
. I read my hard drive with 700k files with quotemeta and had no failures.
I have suspicion (though I have not demonstrated it yet) that quotemeta might fail with multibyte characters where one or more of the bytes falls into the ASCII range. For example,à
can be encoded as one character (UTF8 C3 A0) or as two characters (U+0061 gives a
u+0300 is a combining graves accent). The only demonstrated failure I have with quotemeta is with files with a \n
or \r
in the path that I created. I would be interested in other characters to put in nasty_names
to test.
ShellQuote works perfectly on all file names except those terminated by a NUL when creating a file. I have never ever had a failure with it.
So what to use? Just to be clear: shell quoting is not something I do often, since I usually just use Perl open to open a pipe to a process. That method does not suffer the shell issues discussed. I am interested since I have seen quotemeta used often for file name escaping.
(Thanks to Ether I have added IPC::System::Simple)
Test file:
use strict; use warnings; use autodie;
use String::ShellQuote;
use File::Find;
use File::Path;
use IPC::System::Simple 'capturex';
my @nasty_names;
my $top_dir = '/Users/andrew/bin/pipetestdir/testdir';
my $sub_dir = "easy_to_remove_me";
my (@qfail, @sfail, @ipcfail);
sub wanted {
if ($File::Find::name) {
my $rtr;
my $exec1="ls ".quotemeta($File::Find::name);
my $exec2="ls ".shell_quote($File::Find::name);
my @exec3= ("ls", $File::Find::name);
$rtr=`$exec1`;
push @qfail, "$exec1"
if $rtr=~/^\s*$/ ;
$rtr=`$exec2`;
push @sfail, "$exec2"
if $rtr=~/^\s*$/ ;
$rtr = capturex(@exec3);
push @ipcfail, \@exec3
if $rtr=~/^\s*$/ ;
}
}
chdir($top_dir) or die "$!";
mkdir "$top_dir/$sub_dir";
chdir "$top_dir/$sub_dir";
push @nasty_names, "name with new line \n in the middle";
push @nasty_names, "name with CR \r in the middle";
push @nasty_names, "name with tab\tright there";
push @nasty_names, "utf \x{0061}\x{0300} combining diacritic";
push @nasty_names, "utf e̋ alt combining diacritic";
push @nasty_names, "utf e\x{cc8b} alt combining diacritic";
push @nasty_names, "utf άέᾄ greek";
push @nasty_names, 'back\slashes\\Not\\\at\\\\end';
push @nasty_names, qw|back\slashes\\IS\\\at\\\\end\\\\|;
sub create_nasty_files {
for my $name (@nasty_names) {
open my $fh, '>', $name ;
close $fh;
}
}
for my $dir (@nasty_names) {
chdir("$top_dir/$sub_dir");
mkpath($dir);
chdir $dir;
create_nasty_files();
}
find(\&wanted, $top_dir);
print "\nquotemeta failed on:\n", join "\n", @qfail;
print "\nShell Quote failed on:\n", join "\n", @sfail;
print "\ncapturex failed on:\n", join "\n", @ipcfail;
print "\n\n\n",
"Remove \"$top_dir/$sub_dir\" before running again...\n\n";
Quotemeta is safe under these assumptions:
The shell violates rules 2 and 3 no matter what quote context you use -- outside of quotes, backslash-newline doesn't generate newline; in double-quotes, backslash-punctuation puts a backslash into the output (outside of a certain list of punctuation); and in single-quotes, everything is literal and backslash doesn't even protect you against a closing single-quote.
I still recommend String::ShellQuote
if you need to quote things for the shell. I also recommend avoiding letting the shell process your filenames entirely, if you can, by using LIST
-form system
/exec
/open
or IPC::Open2, IPC::Open3, or IPC::System::Simple.
As for things besides the shell... lots of different things violate one or more of the rules. For example, obsolete POSIX "basic" regexes and various kinds of editor regexes have punctuation characters that are non-special by default, but become special when preceded by backslash. Basically what I'm saying is, know the thing that you're feeding your data to very well, and escape properly. Only use quotemeta
if it's an exact fit, or if you're using it for something that's not very important.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With