In PEP 0263 the format for defining the encoding of a Python file is defined as:
coding[:=]\s*([-\w.]+)
Why is there a .
in the regex, or alternatively, why is there -
and \w
? So far as I understand, the .
matches any character except a newline, so either [-\w]
or [.]
would match the legal names, which consist of alphanumeric characters and the dash.
What is the reason for having both -\w
and .
specified together in [-\w.]
?
When you use .
and -
in character classes, they both behave differently. In character classes, .
has no special meaning and will be treated as dot only, where as -
can be used to specify ranges like a-zA-Z0-9
.
Since we don't use -
to represent a range in this case, both .
and -
will be matching themselves only. They will not have special meanings.
Also note that, \w
can be defined as [a-zA-Z0-9_]
. It matches only underscore character (_
) not the dash (-
).
Quoting from the Python RegEx documentation,
\w
When the
LOCALE
andUNICODE
flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set[a-zA-Z0-9_]
. WithLOCALE
, it will match the set[0-9_]
plus whatever characters are defined as alphanumeric for the current locale. IfUNICODE
is set, this will match the characters[0-9_]
plus whatever is classified as alphanumeric in the Unicode character properties database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With