In PEP 0263 the format for defining the encoding of a Python file is defined as:
coding[:=]\s*([-\w.]+)
Why is there a . in the regex, or alternatively, why is there - and \w? So far as I understand, the . matches any character except a newline, so either [-\w] or [.] would match the legal names, which consist of alphanumeric characters and the dash.
What is the reason for having both -\w and . specified together in [-\w.]?
When you use . and - in character classes, they both behave differently. In character classes, . has no special meaning and will be treated as dot only, where as - can be used to specify ranges like a-zA-Z0-9.
Since we don't use - to represent a range in this case, both . and - will be matching themselves only. They will not have special meanings.
Also note that, \w can be defined as [a-zA-Z0-9_]. It matches only underscore character (_) not the dash (-).
Quoting from the Python RegEx documentation,
\w
When the
LOCALEandUNICODEflags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set[a-zA-Z0-9_]. WithLOCALE, it will match the set[0-9_]plus whatever characters are defined as alphanumeric for the current locale. IfUNICODEis set, this will match the characters[0-9_]plus whatever is classified as alphanumeric in the Unicode character properties database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With