When using the argparse module in Python I am looking for a way to trap invalid options and report them better. The documentation at https://docs.python.org/3/library/argparse.html#invalid-arguments provides an example:
parser = argparse.ArgumentParser(prog='PROG'
parser.add_argument('--foo', type=int)
parser.add_argument('bar', nargs='?')
# invalid option
parser.parse_args(['--bar'])
usage: PROG [-h] [--foo FOO] [bar]
PROG: error: no such option: --bar
However it is quite easy to trip this up as bad options are not reported first. For example:
import argparse
import datetime
def convertIsoTime(timestamp):
"""read ISO-8601 time-stamp using the AMS conventional format YYYY-MM-DDThh:mm:ssUTC"""
try:
return datetime.datetime.strptime(timestamp,"%Y-%m-%dT%H:%M:%SUTC")
except:
raise argparse.ArgumentTypeError("'{}' is not a valid ISO-8601 time-stamp".format(timestamp))
parser = argparse.ArgumentParser()
parser.add_argument('startTime', type=convertIsoTime)
parser.add_argument('--good', type=int,
help='foo')
args = parser.parse_args(['--gold','5','2015-01-01T00:00:00UTC'])
will report:
error: argument startTime: '5' is not a valid ISO-8601 time-stamp
When I would prefer it to report the more useful:
error: no such option: --gold
Is it possible to achieve this? It seems to me a quite basic use case. When writing argument parsers directly I typically use a pattern such that anything starting with a -- option prefix that is not a known option is rejected immediately. For example in bash
# Process command-line arguments
while [ $# -gt 0 ]; do
case "$1" in
--debug)
DEBUGOPTION="--debug"
shift
break;;
--)
shift
break;;
--*)
handleUsageError "$1"
shift;;
*)
break;;
esac
done
I believe argparse uses regular expressions internally but I don't think they are accessible via add_argument()
Is there any way to do the equivalent easily with argparse?
The short answer is that parse_args
uses parse_known_args
. This method lets you handle unknown arguments like --gold
. As a result, argument type errors get raised before unknown arguments
errors.
I've added a solution that involves subclassing ArgumentParser
and modifying a method deep in its calling stack.
I'll try to outline parse_args
as applied to your example.
The first thing it does is categorize the strings as either O
or A
. Put simply, ones that begin with -
are O
, others A
. It also tries to match the O
ones with a defined argument.
In your example, it finds OAA
. Regex is used to match this string against patterns defined by the argument nargs
. (if needed I can explain this step in more detail)
--gold
does not match; at some point (whether in this initial loop or later) it gets put into a extras
list. (I'll check the code for details).
For the 2nd loop through the strings it alternately tries to handle postionals and optionals.
It's when trying to match the 5
with starttime
that your Action class raises the type error, which propagates up to printing the usage and exiting. Because --gold
is not defined, 5
is not consumed as an optional's argument. Thus it gets parsed as the first positional string. (Some kinds of optionals take 0 arguments, so it does not assume anything following an --...
is an optionals argument).
I think, that without the 5
, the last string would match. parse_known_args
would return with --gold
in the extras
term. parse_args
uses parse_known_args
but raises an error when extras
is not empty.
So in a sense the parser does detect both errors, but it's the starttime
one that triggers the error message. It waits till the end to complain about unrecognized --gold
.
As a general philosophy, argparse
does not try to detect and present all errors. It does not collect a list of errors to present in one final comprehensive message.
I'll review the code to check the details. I don't think you can easily change basic parsing pattern. If I think of a way to force an earlier unrecognized option
error, I'll edit this answer.
def _parse_optional(self, arg_string):
tries to classify an argv
string. If the string looks like a positional
it returns None
. If it matches an Action option_string, it returns a tuple '(action, option_string, None)` with the matching action. Finally if not match, it returns:
# it was meant to be an optional but there is no such option
# in this parser (though it might be a valid option in a subparser)
return None, arg_string, None
I think that is what happens with your --gold
. Note the reason why it might still be a valid option.
This function is called by
def _parse_known_args(self, arg_strings, namespace):
...
for i, arg_string in enumerate(arg_strings_iter):
....
option_tuple = self._parse_optional(arg_string)
if option_tuple is None:
pattern = 'A'
else:
option_string_indices[i] = option_tuple
pattern = 'O'
arg_string_pattern_parts.append(pattern)
...
# at the end
# return the updated namespace and the extra arguments
return namespace, extras
collecting that 'AOO'
pattern, as well a list of these tuples.
During a 2nd loop it alternates between consuming positionals and optionals. The function that consumes an optional is:
def consume_optional(start_index):
option_tuple = option_string_indices[start_index]
action, option_string, explicit_arg = option_tuple
if action is None:
extras.append(arg_strings[start_index])
...otherwise...
take_action(action, args, option_string)
As I wrote earlier, your --gold
gets put on the extras
list, while 5
remains on the list of arguments that can be parsed as positionals.
The namespace
and extras
are passed on through parse_known_args
to you, the user, or to parse_args
.
Conceivably you could subclass ArgumentParser
and define a modified _parse_optional
method. It could raise an error instead of returning that (None, arg_string, None)
tuple.
import argparse
import datetime
class MyParser(argparse.ArgumentParser):
def _parse_optional(self, arg_string):
arg_tuple = super(MyParser, self)._parse_optional(arg_string)
if arg_tuple is None:
return arg_tuple # positional
else:
if arg_tuple[0] is not None:
return arg_tuple # valid optional
else:
msg = 'error: no such option: %s'%arg_string
self.error(msg)
def convertIsoTime(timestamp):
"""read ISO-8601 time-stamp using the AMS conventional format YYYY-MM-DDThh:mm:ssUTC"""
try:
return datetime.datetime.strptime(timestamp,"%Y-%m-%dT%H:%M:%SUTC")
except:
raise argparse.ArgumentTypeError("'{}' is not a valid ISO-8601 time-stamp".format(timestamp))
# parser = argparse.ArgumentParser()
parser = MyParser()
parser.add_argument('startTime', type=convertIsoTime)
parser.add_argument('--good', type=int,
help='foo')
args = parser.parse_args(['--good','5','2015-01-01T00:00:00UTC'])
print(args)
args = parser.parse_args(['--gold','5','2015-01-01T00:00:00UTC'])
produces
1505:~/mypy$ python3 stack31317166.py
Namespace(good=5, startTime=datetime.datetime(2015, 1, 1, 0, 0))
usage: stack31317166.py [-h] [--good GOOD] startTime
stack31317166.py: error: error: no such option: --gold
Subclassing to provide custom action is good argparse
(and Python) practice.
If you want more consideration of this case by Python developers, consider writing a bug/issue
(at PEP is for more developed formal ideas). But there is quite a backlog of argparse
bugs/patches, and a lot of caution about backwards compatibility.
http://bugs.python.org/issue?%40columns=id%2Cactivity%2Ctitle%2Ccreator%2Cassignee%2Cstatus%2Ctype&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file%3Acontent&%40search_text=_parse_optional&submit=search&status=-1%2C1%2C2%2C3
is a list of bug/issues that reference _parse_optional
. Possible changes include how ambiguous optionals are handled. (I'll scan them to see if I'm forgetting anything. A some of the patches are mine.) But by using super
, my suggested change is not affected by changes within the function. It's affected only by changes in how the function is called and what it returns, which is much less likely to occur. By filing your own issue, you at least put the developers on notice that someone depends on this interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With