On book Eloquent JavaScript chapter 9: Regular Expressions under Section "Parsing an INI File" there's an example which includes a regular expression I don't catch at all. The author is trying to parse next content:
searchengine=http://www.google.com/search?q=$1
spitefulness=9.7
; comments are preceded by a semicolon...
; each section concerns an individual enemy
[larry]
fullname=Larry Doe
type=kindergarten bully
website=http://www.geocities.com/CapeCanaveral/11451
[gargamel]
fullname=Gargamel
type=evil sorcerer
outputdir=/home/marijn/enemies/gargamel
On the rules for this format state that
Blank lines and lines starting with semicolons are ignored.
The code which parses this content goes over every line in the file. In order to process comments, he include this expression
^\s*(;.*)?
As far as I understand, this expression process lines which may start with a sequence of
white space characters, including space, tab, form feed, line feed and other Unicode spaces
(source) until it appears a semi-colon ; and then a sequence of "any single character except line terminators: \n, \r, \u2028 or \u2029.". All this restricted to {0,1} appearances.
I don't get the point of quantifier ? here. I'm not able to find (regex101) any case where not limiting appearances of matching string can be a problem. Why that expression is different to this other one:
^\s*(;.*)
Thanks in advance.
Quantifiers in Python:A quantifier has the form {m,n} where m and n are the minimum and maximum times the expression to which the quantifier applies must match. We can use quantifiers to specify the number of occurrences to match.
Pattern matching is used by the shell commands such as the ls command, whereas regular expressions are used to search for strings of text in a file by using commands, such as the grep command. Lists all the files in the directory.
Note that a* means zero or more occurrence of a in the string while a+ means that one or more occurrence of a in the string.
The ^\s*(;.*)
requires a ;
, it cannot match a blank line.
The ^\s*(;.*)?
can match an blank line, it does not require ;
.
The common part is ^\s*
- start of line (or string) and then zero or more whitespaces.
Then 1) (;.*)
matches a ;
(1 instance obligatorily) and then zero or more characters other than newline, and 2) (;.*)?
matches an optional sequence (the (...)?
is an optional group since ?
is a quantifier matching one or zero occurrences of the quantified atom, while the atom can be a symbol, a character class, a group) of a ;
followed with 0+ characters other than a newline.
Also, note that \s
matches an LF and CR symbols and that means that (if the MULTILINE modifier is ON and the input is a text containing multiple lines) the regex ^\s*
may match across several lines until the first non-whitespace character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With