Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reverse escape backslash encodings like "\ " and "\303\266" in bash?

I have a script that records files with UTF8 encoded names. However the script's encoding / environment wasn't set up right, and it just recoded the raw bytes. I now have lots of lines in the file like this:

.../My\ Folders/My\ r\303\266m/...

So there are spaces in the filenames with \ and UTF8 encoded stuff like \303\266 (which is ö). I want to reverse this encoding? Is there some easy set of bash command line commands I can chain together to remove them?

I could get millions of sed commands but that'd take ages to list all the non-ASCII characters we have. Or start parsing it in python. But I'm hoping there's some trick I can do.

like image 605
Amandasaurus Avatar asked Jan 23 '23 05:01

Amandasaurus


2 Answers

Here's a rough stab at the Unicode characters:

text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo "$text"|sed -e 's|\\|\\\\|g')"\'"
# the argument to the echo must not be quoted or escaped-quoted in the next step
text=$(eval "echo $(eval "$text")")
read text < <(echo "$text")
echo "$text"

This makes use of the $'string' quoting feature of Bash.

This outputs "/My Folders/My röm/".

As of Bash 4.4, it's as easy as:

text="/My Folders/My r\303\266m/"
echo "${text@E}"

This uses a new feature of Bash called parameter transformation. The E operator causes the parameter to be treated as if its contents were inside $'string' in which backslash escaped sequences, in this case octal values, are evaluated.

like image 76
Dennis Williamson Avatar answered Jan 29 '23 06:01

Dennis Williamson


It is not clear exactly what kind of escaping is being used. The octal character codes are C, but C does not escape space. The space escape is used in the shell, but it does not use octal character escapes.

Something close to C-style escaping can be undone using the command printf %b $escaped. (The documentation says that octal escapes start with \0, but that does not seem to be required by GNU printf.) Another answer mentions read for unescaping shell escapes, although if space is the only one that is not handled by printf %b then handling that case with sed would probably be better.

like image 33
mark4o Avatar answered Jan 29 '23 07:01

mark4o