Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Regular Expressions

Tags:

regex

php

I am tired of being frightened of regular expressions. The topic of this post is limited to PHP implementation of regular expressions, however, any generic regular expression advice would obviously be appreciated (i.e. don't confuse me with scope that is not applicable to PHP).

The following (I believe) will remove any whitespace between numbers. Maybe there is a better way to do so, but I still want to understand what is going on.

$pat="/\b(\d+)\s+(?=\d+\b)/";
$sub="123 345";
$string=preg_replace($pat, "$1", $sub);

Going through the pattern, my interpretation is:

  • \b A word boundary
  • \d+ A subpattern of 1 or more digits
  • \s+ One or more whitespaces
  • (?=\d+\b) Lookahead assertion of one or more digit followed by a word boundary?
  • Putting it all together, search for any word boundary followed by one or more digits and then some whitespace, and then do some sort of lookahead assertion on it, and save the results in $1 so it can replace the pattern?

Questions:

  • Is my above interpretation correct?
  • What is that lookahead assertion all about?
  • What is the purpose of the leading / and trailing /?
like image 704
user1032531 Avatar asked Nov 30 '12 13:11

user1032531


2 Answers

Is my above interpretation correct?

Yes, your interpretation is correct.

What is that lookahead assertion all about?

That lookahead assertion is a way for you to match characters that have a certain pattern in front of them, without actually having to match the pattern.

So basically, using the regex abcd(?=e) to match the string abcde will give you the match: abcd.

The reason that this matches is that the string abcde does in fact contain:

  1. An a
  2. Followed by a b
  3. Followed by a c
  4. Followed by a d that has an e after it (this is a single character!)

It is important to note that after the 4th item it also contains an actual "e" character, which we didn't match.

On the other hand, trying to match the string against the regex abcd(?=f) will fail, since the sequence:

"a", followed by "b", followed by "c", followed by "d that has an f in front of it"

is not found.

What is the purpose of the leading / and trailing /

Those are delimiters, and are used in PHP to distinguish the pattern part of your string from the modifier part of your string. A delimiter can be any character, although I prefer @ signs myself. Remember that the character you are using as a delimiter needs to be escaped if it is used in your pattern.

like image 188
Asad Saeeduddin Avatar answered Oct 31 '22 18:10

Asad Saeeduddin


It would be a good idea to watch this video, and the 4 that follow this: http://blog.themeforest.net/screencasts/regular-expressions-for-dummies/ The rest of the series is found here: http://blog.themeforest.net/?s=regex+for+dummies

A colleague sent me the series and after watching them all I was much more comfortable using Regular Expressions.

Another good idea would be installing RegexBuddy or Regexr. Especially RegexBuddy is very useful for understanding the workings of a regular expression.

like image 31
Maarten00 Avatar answered Oct 31 '22 19:10

Maarten00