Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between this two regular expressions? (Understanding ? Quantifier)

On book Eloquent JavaScript chapter 9: Regular Expressions under Section "Parsing an INI File" there's an example which includes a regular expression I don't catch at all. The author is trying to parse next content:

searchengine=http://www.google.com/search?q=$1
spitefulness=9.7

; comments are preceded by a semicolon...
; each section concerns an individual enemy
[larry]
fullname=Larry Doe
type=kindergarten bully
website=http://www.geocities.com/CapeCanaveral/11451

[gargamel]
fullname=Gargamel
type=evil sorcerer
outputdir=/home/marijn/enemies/gargamel

On the rules for this format state that

Blank lines and lines starting with semicolons are ignored.

The code which parses this content goes over every line in the file. In order to process comments, he include this expression

^\s*(;.*)?

As far as I understand, this expression process lines which may start with a sequence of

white space characters, including space, tab, form feed, line feed and other Unicode spaces

(source) until it appears a semi-colon ; and then a sequence of "any single character except line terminators: \n, \r, \u2028 or \u2029.". All this restricted to {0,1} appearances.

I don't get the point of quantifier ? here. I'm not able to find (regex101) any case where not limiting appearances of matching string can be a problem. Why that expression is different to this other one:

^\s*(;.*)

Thanks in advance.

like image 207
Noob_Number_1 Avatar asked Aug 19 '16 15:08

Noob_Number_1


People also ask

What are quantifiers in regular expressions in Python?

Quantifiers in Python:A quantifier has the form {m,n} where m and n are the minimum and maximum times the expression to which the quantifier applies must match. We can use quantifiers to specify the number of occurrences to match.

What is the difference between pattern and regular expression?

Pattern matching is used by the shell commands such as the ls command, whereas regular expressions are used to search for strings of text in a file by using commands, such as the grep command. Lists all the files in the directory.

What is the difference between A * and A+ regular expression?

Note that a* means zero or more occurrence of a in the string while a+ means that one or more occurrence of a in the string.


1 Answers

The ^\s*(;.*) requires a ;, it cannot match a blank line.

The ^\s*(;.*)? can match an blank line, it does not require ;.

The common part is ^\s* - start of line (or string) and then zero or more whitespaces.

Then 1) (;.*) matches a ; (1 instance obligatorily) and then zero or more characters other than newline, and 2) (;.*)? matches an optional sequence (the (...)? is an optional group since ? is a quantifier matching one or zero occurrences of the quantified atom, while the atom can be a symbol, a character class, a group) of a ; followed with 0+ characters other than a newline.

Also, note that \s matches an LF and CR symbols and that means that (if the MULTILINE modifier is ON and the input is a text containing multiple lines) the regex ^\s* may match across several lines until the first non-whitespace character.

like image 65
Wiktor Stribiżew Avatar answered Oct 26 '22 19:10

Wiktor Stribiżew