Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

report invalid options first (or use regular expressions) with python argparse module

When using the argparse module in Python I am looking for a way to trap invalid options and report them better. The documentation at https://docs.python.org/3/library/argparse.html#invalid-arguments provides an example:

parser = argparse.ArgumentParser(prog='PROG'
parser.add_argument('--foo', type=int)
parser.add_argument('bar', nargs='?')

# invalid option
parser.parse_args(['--bar'])
usage: PROG [-h] [--foo FOO] [bar]
PROG: error: no such option: --bar

However it is quite easy to trip this up as bad options are not reported first. For example:

import argparse
import datetime

def convertIsoTime(timestamp):
    """read ISO-8601 time-stamp using the AMS conventional format YYYY-MM-DDThh:mm:ssUTC"""
    try:
        return datetime.datetime.strptime(timestamp,"%Y-%m-%dT%H:%M:%SUTC")
    except:
        raise argparse.ArgumentTypeError("'{}' is not a valid ISO-8601 time-stamp".format(timestamp))

parser = argparse.ArgumentParser()
parser.add_argument('startTime', type=convertIsoTime)
parser.add_argument('--good', type=int,
                    help='foo')

args = parser.parse_args(['--gold','5','2015-01-01T00:00:00UTC'])

will report:

error: argument startTime: '5' is not a valid ISO-8601 time-stamp

When I would prefer it to report the more useful:

error: no such option: --gold

Is it possible to achieve this? It seems to me a quite basic use case. When writing argument parsers directly I typically use a pattern such that anything starting with a -- option prefix that is not a known option is rejected immediately. For example in bash

# Process command-line arguments
while [ $# -gt 0 ]; do
   case "$1" in
   --debug)
      DEBUGOPTION="--debug"
      shift
      break;;
   --)
      shift
      break;;
   --*)
      handleUsageError "$1"
      shift;;
   *)
      break;;
   esac
done

I believe argparse uses regular expressions internally but I don't think they are accessible via add_argument()

Is there any way to do the equivalent easily with argparse?

like image 851
Bruce Adams Avatar asked Nov 09 '22 09:11

Bruce Adams


1 Answers

The short answer is that parse_args uses parse_known_args. This method lets you handle unknown arguments like --gold. As a result, argument type errors get raised before unknown arguments errors.

I've added a solution that involves subclassing ArgumentParser and modifying a method deep in its calling stack.


I'll try to outline parse_args as applied to your example.

The first thing it does is categorize the strings as either O or A. Put simply, ones that begin with - are O, others A. It also tries to match the O ones with a defined argument.

In your example, it finds OAA. Regex is used to match this string against patterns defined by the argument nargs. (if needed I can explain this step in more detail)

--gold does not match; at some point (whether in this initial loop or later) it gets put into a extras list. (I'll check the code for details).

For the 2nd loop through the strings it alternately tries to handle postionals and optionals.

It's when trying to match the 5 with starttime that your Action class raises the type error, which propagates up to printing the usage and exiting. Because --gold is not defined, 5 is not consumed as an optional's argument. Thus it gets parsed as the first positional string. (Some kinds of optionals take 0 arguments, so it does not assume anything following an --... is an optionals argument).

I think, that without the 5, the last string would match. parse_known_args would return with --gold in the extras term. parse_args uses parse_known_args but raises an error when extras is not empty.

So in a sense the parser does detect both errors, but it's the starttime one that triggers the error message. It waits till the end to complain about unrecognized --gold.

As a general philosophy, argparse does not try to detect and present all errors. It does not collect a list of errors to present in one final comprehensive message.

I'll review the code to check the details. I don't think you can easily change basic parsing pattern. If I think of a way to force an earlier unrecognized option error, I'll edit this answer.


def _parse_optional(self, arg_string): tries to classify an argv string. If the string looks like a positional it returns None. If it matches an Action option_string, it returns a tuple '(action, option_string, None)` with the matching action. Finally if not match, it returns:

    # it was meant to be an optional but there is no such option
    # in this parser (though it might be a valid option in a subparser)
    return None, arg_string, None

I think that is what happens with your --gold. Note the reason why it might still be a valid option.

This function is called by

def _parse_known_args(self, arg_strings, namespace):
  ...
  for i, arg_string in enumerate(arg_strings_iter):
      ....
      option_tuple = self._parse_optional(arg_string)
      if option_tuple is None:
         pattern = 'A'
      else:
         option_string_indices[i] = option_tuple
         pattern = 'O'
      arg_string_pattern_parts.append(pattern)
  ...
  # at the end
  # return the updated namespace and the extra arguments
  return namespace, extras

collecting that 'AOO' pattern, as well a list of these tuples.

During a 2nd loop it alternates between consuming positionals and optionals. The function that consumes an optional is:

def consume_optional(start_index):
    option_tuple = option_string_indices[start_index]
    action, option_string, explicit_arg = option_tuple
    if action is None:
       extras.append(arg_strings[start_index])
    ...otherwise...
       take_action(action, args, option_string)

As I wrote earlier, your --gold gets put on the extras list, while 5 remains on the list of arguments that can be parsed as positionals.

The namespace and extras are passed on through parse_known_args to you, the user, or to parse_args.

Conceivably you could subclass ArgumentParser and define a modified _parse_optional method. It could raise an error instead of returning that (None, arg_string, None) tuple.

import argparse
import datetime

class MyParser(argparse.ArgumentParser):
    def _parse_optional(self, arg_string):
        arg_tuple = super(MyParser, self)._parse_optional(arg_string)
        if arg_tuple is None:
            return arg_tuple  # positional
        else:
            if arg_tuple[0] is not None:
                return arg_tuple # valid optional
            else:
                msg = 'error: no such option: %s'%arg_string
                self.error(msg)

def convertIsoTime(timestamp):
    """read ISO-8601 time-stamp using the AMS conventional format YYYY-MM-DDThh:mm:ssUTC"""
    try:
        return datetime.datetime.strptime(timestamp,"%Y-%m-%dT%H:%M:%SUTC")
    except:
        raise argparse.ArgumentTypeError("'{}' is not a valid ISO-8601 time-stamp".format(timestamp))

# parser = argparse.ArgumentParser()
parser = MyParser()
parser.add_argument('startTime', type=convertIsoTime)
parser.add_argument('--good', type=int,
                    help='foo')

args = parser.parse_args(['--good','5','2015-01-01T00:00:00UTC'])
print(args)

args = parser.parse_args(['--gold','5','2015-01-01T00:00:00UTC'])

produces

1505:~/mypy$ python3 stack31317166.py 
Namespace(good=5, startTime=datetime.datetime(2015, 1, 1, 0, 0))
usage: stack31317166.py [-h] [--good GOOD] startTime
stack31317166.py: error: error: no such option: --gold

Subclassing to provide custom action is good argparse (and Python) practice.

If you want more consideration of this case by Python developers, consider writing a bug/issue (at PEP is for more developed formal ideas). But there is quite a backlog of argparse bugs/patches, and a lot of caution about backwards compatibility.


http://bugs.python.org/issue?%40columns=id%2Cactivity%2Ctitle%2Ccreator%2Cassignee%2Cstatus%2Ctype&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file%3Acontent&%40search_text=_parse_optional&submit=search&status=-1%2C1%2C2%2C3

is a list of bug/issues that reference _parse_optional. Possible changes include how ambiguous optionals are handled. (I'll scan them to see if I'm forgetting anything. A some of the patches are mine.) But by using super, my suggested change is not affected by changes within the function. It's affected only by changes in how the function is called and what it returns, which is much less likely to occur. By filing your own issue, you at least put the developers on notice that someone depends on this interface.

like image 75
hpaulj Avatar answered Nov 14 '22 21:11

hpaulj