I have been trying to call exec with an argument that contains multibyte characters that come from an environment variable on Windows, but have not found a solution that works yet. Here is what I have been able to debug so far.
For simplicity's sake assume that I have a directory called "Seán" that I am trying to use as an argument to exec. If I just call
exec 'script', "Se\u00E1n".encode("IBM437")
The script that is exec'ed cannot find the file because the arg gets tweaked in such a way that the accented character is lost. If I do the following it works, but this is bad practice as the arg should be escaped before it goes to the shell.
exec "script #{"Se\u00E1n".encode("IBM437")}"
So my thought was that I would just use shellescape to protect the use of exec.
require 'shellwords'
exec "script #{"Se\u00E1n".encode("IBM437").shellescape}"
But the problem is that it escapes the special character so that it looks like the following - "Se\án". I figured out where this is happening and it is coming from this regular expression.
str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/, "\\\\\\1")
Which at first glance seems to escape characters not in a known good set of shell characters. Unfortunately this set does not include special characters and so I run into problems.
What I am looking for is a regex that would do shell escaping that does not mess up special characters so that I can escape these args before passing them to exec.
The regex /([^A-Za-z0-9_\-.,:\/@\n])/
only handles ASCII letters and digits, not all Unicode letters. The [^...]
is a negated character class that matches all characters other than those specified in the class. So, all Я
, Ц
, Ą
are removed with that expression as they are not matched with [A-Za-z]
.
What you need is to add shorthand classes to exclude all Unicode letters and digits. To make it even more safe, we can add a diacritic class so as to keep diacritics, too:
str.gsub(/([^\p{L}\p{M}\p{N}_.,:\/@\n-])/, "\\\\\\1")
Here, \p{L}
matches all Unicode base letters, \p{M}
matches all diacritics, and \p{N}
matches any Unicode digits.
Note that a hyphen does not need to be escaped when placed at the start/end of the character class (or after a valid range or a shorthand character class).
Escaped characters
Code
String.class_eval do
def escapeshell()
# Escape shell special characters
self.gsub!(/[#-&(-*;<>?\[-^`{-~\u00FF]/, '\\\\\0')
# Escape unbalanced quotes (single and double quotes)
self.gsub!(/(["'])(?:([^"']*(?:(?!\1)["'][^"']*)*)\1)?/) do
if $2.nil?
'\\' + $1
else
# and escape quotes inside (e.g. "x'x" or 'y"y')
qt = $1
qt + $2.gsub(/["']/, '\\\\\0') + qt
end
end
self
end
end
# Test it
str = "(dir *.txt & dir \"\\some dir\\Sè\u00E1ñ*.rb\") | sort /R >Filé.txt 2>&1"
puts 'String:'
puts str
puts "\nEscaped:"
puts str.escapeshell
Output
String:
(dir *.txt & dir "\some dir\Sèáñ*.rb") | sort /R >Filé.txt 2>&1
Escaped:
\(dir \*.txt \& dir "\\some dir\\Sèáñ\*.rb"\) \| sort /R \>Filé.txt 2\>\&1
ideone demo
Metacharacters
Considering the shell metacharacters that should be escaped:
# & % ; ` | * ? ~ < > ^ ( ) [ ] { } $ \ \u00FF
We can include each character in the character class:
[#&%;`|*?~<>^()\[\]{}$\\\u00FF]
Which is exactly the same as:
/[#-&(-*;<>?\[-^`{-~\u00FF]/
Then, we use gsub!()
to prepend a backslash before any character that in the class:
str.gsub!(/[#-&(-*;<>?\[-^`{-~\u00FF]/, '\\\\\0')
Quotes
Only unbalanced quotes need to be escaped. This is important to preserve the command's arguments. With the following expression we match balanced quotes:
/(["'])[^"']*(?:(?!\1)["'][^"']*)*)\1/
As well as unbalanced, making the last part optional
/(["'])(?:[^"']*(?:(?!\1)["'][^"']*)*)\1)?/
But we also need to escape quotes inside another pair. That is single quotes inside double quotes and vice-versa. So we'll nest another gsub()
to replace in the text matched inside quotes ($2
):
str.gsub!(/(["'])(?:([^"']*(?:(?!\1)["'][^"']*)*)\1)?/) do
if $2.nil?
'\\' + $1
else
qt = $1
qt + $2.gsub(/["']/, '\\\\\0') + qt
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With