I have command line arguments in a string and I need to split it to feed to argparse.ArgumentParser.parse_args
.
I see that the documentation uses string.split()
plentifully. However in complex cases, this does not work, such as
--foo "spaces in brakets" --bar escaped\ spaces
Is there a functionality to do that in python?
(A similar question for java was asked here).
To split the line in Python, use the String split() method. The split() is an inbuilt method that returns a list of lines after breaking the given string by the specified separator. In this tutorial, the line is equal to the string because there is no concept of a line in Python. So you can think of a line as a string.
The shlex module defines the following functions: shlex. split (s, comments=False, posix=True) Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be disabled (setting the commenters attribute of the shlex instance to the empty string).
Each argument on the command line is separated by one or more spaces, and the operating system places each argument directly into its own null-terminated string. The second parameter passed to main() is an array of pointers to the character strings containing each argument (char *argv[]).
split() method accepts two arguments. The first optional argument is separator , which specifies what kind of separator to use for splitting the string. If this argument is not provided, the default value is any whitespace, meaning the string will split whenever .
This is what shlex.split
was created for.
If you're parsing a windows-style command line, then shlex.split
doesn't work correctly - calling subprocess
functions on the result will not have the same behavior as passing the string directly to the shell.
In that case, the most reliable way to split a string like the command-line arguments to python is... to pass command line arguments to python:
import sys
import subprocess
import shlex
import json # json is an easy way to send arbitrary ascii-safe lists of strings out of python
def shell_split(cmd):
"""
Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.
On windows, this is the inverse of subprocess.list2cmdline
"""
if os.name == 'posix':
return shlex.split(cmd)
else:
# TODO: write a version of this that doesn't invoke a subprocess
if not cmd:
return []
full_cmd = '{} {}'.format(
subprocess.list2cmdline([
sys.executable, '-c',
'import sys, json; print(json.dumps(sys.argv[1:]))'
]), cmd
)
ret = subprocess.check_output(full_cmd).decode()
return json.loads(ret)
One example of how these differ:
# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\\Users\\me\\some_file.txt', 'file with spaces']
# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']
# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\\Users\\me\\some_file.txt', '"file with spaces"']
You could use the split_arg_string
helper function from the click
package:
import re
def split_arg_string(string):
"""Given an argument string this attempts to split it into small parts."""
rv = []
for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
r'|\S+)\s*', string, re.S):
arg = match.group().strip()
if arg[:1] == arg[-1:] and arg[:1] in '"\'':
arg = arg[1:-1].encode('ascii', 'backslashreplace') \
.decode('unicode-escape')
try:
arg = type(string)(arg)
except UnicodeError:
pass
rv.append(arg)
return rv
For example:
>>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
['this is a test', '1', '2', '1 " 2']
The click
package is starting to dominate for command-arguments parsing, but I don't think it supports parsing arguments from string (only from argv
). The helper function above is used only for bash
completion.
Edit: I can nothing but recommend to use the shlex.split()
as suggested in the answer by @ShadowRanger. The only reason I'm not deleting this answer is because it provides a little bit faster splitting then the full-blown pure-python tokenizer used in shlex
(around 3.5x faster for the example above, 5.9us vs 20.5us). However, this shouldn't be a reason to prefer it over shlex
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With