Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an "escape converter" for file and directory names available?

The day came when I had to write a BASH script that walks arbitrary directory trees and looks at arbitrary files and attempts to determine something regarding a comparison among them. I thought it would be a simple couple-of-hours tops! process - Not So!

My hangup is that sometimes some idiot -ahem!- excuse me, lovely user chooses to put spaces in directory and file names. This causes my script to fail.

The perfect solution, aside from threatening the guillotine for those who insist on using spaces in such places (not to mention the guys who put this in operating systems' code!), might be a routine that "escapes" the file and directory names for us, kind of like how cygwin has routines to convert from unix to dos filename formats. Is there anything like this in a standard Unix / Linux distribution?

Note that the simple for file in * construct doesn't work so well when one is trying to compare directory trees as it ONLY works on "the current directory" - and, in this case as in many others, constantly CDing to various directory locations brings with it its own problems. So, in doing my homework, I found this question Handle special characters in bash for...in loop and the proposed solution there hangs up on spaces in directory names, but can simply be overcome like this:

dir="dirname with spaces"
ls -1 "$dir" | while read x; do
   echo $x
done

PLEASE NOTE: The above code isn't particularly wonderful because the variables used inside the while loop are INACCESSIBLE outside that while loop. This is because there's an implied subshell created when the ls command's output is piped. This is a key motivating factor to my query!

...OK, the code above helps for many situations but "escaping" the characters would be pretty powerful too. For example, dir above might contain:

dir\ with\ spaces

Does this already exist and I've just been overlooking it?

If not, does anyone have an easy proposal to write one - maybe with sed or lex? (I'm far from competent with either.)

like image 940
Richard T Avatar asked Dec 19 '09 20:12

Richard T


People also ask

How do you escape a file in Linux?

The backslash character escapes the character following it. It means that the character that immediately follows backslash character is treated as normal character by the shell and not as special character. For example, we can use backslash for creating a file that contains spaces in its name. That was simple.

How do you cat a file with a space in the name?

Another way to deal with spaces and special characters in a file name is to escape the characters. You put a backslash ( \ ) in front of the special character or space. This makes the bash shell treat the special character like a normal (literal) character.


1 Answers

Make a really nasty filename for testing:

mkdir escapetest
cd escapetest && touch "m'i;x&e\"d u(p\nmulti)\nlines'\nand\015ca&rr\015re;t"

[ Edit: Chances are that I intended that touch command to be:

touch $'m\'i;x&e\"d u(p\nmulti)\nlines\'\nand\015ca&rr\015re;t'

which puts more ugly characters in the filename. The output will look a little different. ]

Then run this:

find -print0 | while read -d '' -r line; do echo -en "--[${line}]--\t\t"; echo "$line"|sed -e ':t;N;s/\n/\\n/;bt' | sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g'; done

The output should look like this:

--[./m'i;x&e"d u(p
multi)
lines'
re;t]--         ./m\'i\;x\&e\"d\ u\(p\\nmulti\)\\nlines\'\\nand\\015ca\&rr\\015re\;t

This consists of a condensed version of Pascal Thivent's sed monster, plus handling for carriage returns and newlines and maybe a bit more.

The first pass through sed merges multiple lines into one delimited by "\n" for filenames that have newlines. The second pass replaces any from a list of characters with a backslash preceding itself. The last part replaces carriage returns with "\r".

One thing to note is that, as you know, while will handle spaces and for won't but by sending the output of find with null termination and setting the delimiter of read to null, you can also handle newlines in filenames. The -r option causes read to accept backslashes without interpreting them.

Edit:

Another way to escape the special characters, this time without using sed, uses the quoting and variable-creating feature of the Bash printf builtin (this also illustrates using process substitution rather than a pipe):

while read -d '' -r file; do echo "$file"; printf -v name "%q" "$file"; echo "$name"; done< <(find -print0)

The variable $name will be available outside the loop, since using process substitution prevents the creation of a subshell around the loop.

like image 146
Dennis Williamson Avatar answered Oct 15 '22 20:10

Dennis Williamson