Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why `.` in PEP 0263 regex?

In PEP 0263 the format for defining the encoding of a Python file is defined as:

coding[:=]\s*([-\w.]+)

Why is there a . in the regex, or alternatively, why is there - and \w? So far as I understand, the . matches any character except a newline, so either [-\w] or [.] would match the legal names, which consist of alphanumeric characters and the dash.

What is the reason for having both -\w and . specified together in [-\w.]?

like image 651
dotancohen Avatar asked Mar 19 '23 19:03

dotancohen


1 Answers

When you use . and - in character classes, they both behave differently. In character classes, . has no special meaning and will be treated as dot only, where as - can be used to specify ranges like a-zA-Z0-9.

Since we don't use - to represent a range in this case, both . and - will be matching themselves only. They will not have special meanings.

Also note that, \w can be defined as [a-zA-Z0-9_]. It matches only underscore character (_) not the dash (-).

Quoting from the Python RegEx documentation,

\w

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database.

like image 193
thefourtheye Avatar answered Mar 28 '23 03:03

thefourtheye