Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross-platform, safe to use command line string separator

For a new feature in PyInstaller, we need a command line option receiving a string with any separator in it. Here's the discussion: https://github.com/pyinstaller/pyinstaller/pull/1990.

Example:

pyinstaller --add-data="file.txt?dir"

? is the separator here, this should be another character. It's not guaranteed, that the string is quoted!

We've thought about ; : > < | * and so on, but we can't figure out what character would be save to use, without side effects and platform independent (and hopefully not allowed in paths). > e.g will redirect stdout, ; is the command separator on POSIX ect.

Any ideas what character we can use?

like image 972
linusg Avatar asked May 18 '16 16:05

linusg


1 Answers

The real problem and its solution

Your question is an instance of XY problem to some extent. A red herring at least.

As I show below, no ideal path delimiters exist, and therefore you have to pass that information in separate command-line options, if you really insist on supporting arbitrarily crazy paths. It is up to the users, then, to escape their weird characters in paths when calling your program.

No ideal path delimiters exists

Unix paths can contain any characters except ASCII NUL (\0). Path components (file names) are not allowed to contain slash (/). Anything else is OK, according to POSIX.

Therefore, you picked too tight constraints. No ideal solution to your problem exists even on Unix, completely ignoring the portability issue.

Good path delimiters

You have to put some “common sense” constraints on paths, e.g. that they will not contain semicolon on Windows and colon on Unix. This combination is quite natural, intuitive and easy to read, by the way, because these characters are path separators for these systems.

Let’s find if you can reserve just one character that may never occur in a path. Will the set of constraints be satisfiable then?

If you list non-alphanumeric printable ASCII characters and remove those with special meaning for Unix shell and those used in paths even by sane people (_, -, etc.), you can pick a reasonable path delimiter:

LC_ALL=C
awk 'BEGIN{ for (i=1;i<ARGC;i++) printf "%c\n", ARGV[i]; }' {1..127} |
    grep '^[[:print:]]$' |
    grep '^[^][*?~$`"'\''&|#\<>(){}!;/[:alnum:] ._-]$'

ASCII is 0..127, but 0 is excluded as it causes trouble with the text-oriented utilities. Bash specials are filtered out, too.

The resulting set contains just seven characters, though: %+,:=@^

Aaah, percent (%) and caret (^) unfortunately have special meaning in cmd.exe and colon (:) in Windows paths. Only four remaining: +,=@

Either you pick one of those, or you assume they are inconvenient and you revise the list of specials to pick different character for different systems (e.g. the colon and semicolon compromise you have suggested), which relaxes the portability constraint a bit. Or maybe the tilde (~) is not that special in shell as it is expanded to home directory path only at shell word start. Or maybe you do not want a separator character, but separator string – you can guess that very few files have @@@ in their names.

like image 94
Palec Avatar answered Oct 21 '22 18:10

Palec