Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confused on a basic operation of regular expressions

Tags:

regex

dfa

nfa

I have a rather basic question about regexes.
I use the expression .* without thinking about it match expecting to match e.g. up to the end of the line. This works.
But for some reason I started thinking about this expression. Checking Wikipedia (my emphasis)

.  Matches any single character  
*  Matches the **preceding** element zero or more times  

So now according to this definition, why doesn't .* try to match the first character in the string 0 or more times but instead tries to apply the match to each character in the string?
I mean if I have abc it should try to match a,aa,aaa etc right?
But it does not:

 ~
$ perl -e '  
> my $var="abcdefg";  
> $var =~ /(.*)/;   
> print "$1\n";'   
abcdefg   
like image 845
Jim Avatar asked Jan 12 '23 06:01

Jim


2 Answers

Confusion starts with the word "element" in Matches the **preceding** element zero or more times. Term "preceding element" here refers to "preceding pattern" rather than to "preceding capture" (or "preceding match").

like image 51
Kuba Wyrostek Avatar answered Feb 03 '23 06:02

Kuba Wyrostek


This:

.{2,4}

is really shorthand for this:

(..)|(...)|(....)

In the same way, this:

.*

is really shorthand for this:

()|(.)|(..)|(...)| // etc.
like image 20
Oliver Charlesworth Avatar answered Feb 03 '23 08:02

Oliver Charlesworth