Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string into command line arguments like the shell in python?

I have command line arguments in a string and I need to split it to feed to argparse.ArgumentParser.parse_args.

I see that the documentation uses string.split() plentifully. However in complex cases, this does not work, such as

--foo "spaces in brakets"  --bar escaped\ spaces

Is there a functionality to do that in python?

(A similar question for java was asked here).

like image 544
P-Gn Avatar asked Jul 06 '17 10:07

P-Gn


People also ask

How do you split a string into lines in Python?

To split the line in Python, use the String split() method. The split() is an inbuilt method that returns a list of lines after breaking the given string by the specified separator. In this tutorial, the line is equal to the string because there is no concept of a line in Python. So you can think of a line as a string.

What is Shlex split () in Python?

The shlex module defines the following functions: shlex. split (s, comments=False, posix=True) Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be disabled (setting the commenters attribute of the shlex instance to the empty string).

How do you separate command line arguments?

Each argument on the command line is separated by one or more spaces, and the operating system places each argument directly into its own null-terminated string. The second parameter passed to main() is an array of pointers to the character strings containing each argument (char *argv[]).

Can split () take two arguments?

split() method accepts two arguments. The first optional argument is separator , which specifies what kind of separator to use for splitting the string. If this argument is not provided, the default value is any whitespace, meaning the string will split whenever .


3 Answers

This is what shlex.split was created for.

like image 174
ShadowRanger Avatar answered Sep 17 '22 17:09

ShadowRanger


If you're parsing a windows-style command line, then shlex.split doesn't work correctly - calling subprocess functions on the result will not have the same behavior as passing the string directly to the shell.

In that case, the most reliable way to split a string like the command-line arguments to python is... to pass command line arguments to python:

import sys
import subprocess
import shlex
import json  # json is an easy way to send arbitrary ascii-safe lists of strings out of python

def shell_split(cmd):
    """
    Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.

    On windows, this is the inverse of subprocess.list2cmdline
    """
    if os.name == 'posix':
        return shlex.split(cmd)
    else:
        # TODO: write a version of this that doesn't invoke a subprocess
        if not cmd:
            return []
        full_cmd = '{} {}'.format(
            subprocess.list2cmdline([
                sys.executable, '-c',
                'import sys, json; print(json.dumps(sys.argv[1:]))'
            ]), cmd
        )
        ret = subprocess.check_output(full_cmd).decode()
        return json.loads(ret)

One example of how these differ:

# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\\Users\\me\\some_file.txt', 'file with spaces']

# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']

# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\\Users\\me\\some_file.txt', '"file with spaces"']  
like image 42
Eric Avatar answered Sep 17 '22 17:09

Eric


You could use the split_arg_string helper function from the click package:

import re

def split_arg_string(string):
    """Given an argument string this attempts to split it into small parts."""
    rv = []
    for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
                             r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
                             r'|\S+)\s*', string, re.S):
        arg = match.group().strip()
        if arg[:1] == arg[-1:] and arg[:1] in '"\'':
            arg = arg[1:-1].encode('ascii', 'backslashreplace') \
                .decode('unicode-escape')
        try:
            arg = type(string)(arg)
        except UnicodeError:
            pass
        rv.append(arg)
    return rv

For example:

>>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
['this is a test', '1', '2', '1 " 2']

The click package is starting to dominate for command-arguments parsing, but I don't think it supports parsing arguments from string (only from argv). The helper function above is used only for bash completion.

Edit: I can nothing but recommend to use the shlex.split() as suggested in the answer by @ShadowRanger. The only reason I'm not deleting this answer is because it provides a little bit faster splitting then the full-blown pure-python tokenizer used in shlex (around 3.5x faster for the example above, 5.9us vs 20.5us). However, this shouldn't be a reason to prefer it over shlex.

like image 23
randomir Avatar answered Sep 17 '22 17:09

randomir