Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching a multiple lines pattern via PHP's preg_match()

How can I match subject via a PHP preg_match() regular expression pattern in this HTML code:

      <table border=0>   <tr>   <td>     <h2>subject</h2>        </td> 

All the whitespaces and newlines are left on purpose. So the problem is in extracting subject name using some multiple line pattern.

like image 712
Dmitriy Ryabinin Avatar asked Jan 22 '12 01:01

Dmitriy Ryabinin


People also ask

What is the use of Preg_match () method?

The preg_match() function returns whether a match was found in a string.

What is the purpose of Preg_match () regular expression in PHP?

The preg_match() function will tell you whether a string contains matches of a pattern.

What is multiline matching?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

What does preg match return?

preg_match() returns 1 if the pattern matches given subject , 0 if it does not, or false on failure. This function may return Boolean false , but may also return a non-Boolean value which evaluates to false .


1 Answers

If you're looking for (e.g.) a h2 tag nested within a td tag where there's only whitespace in between the two, just use \s which includes spaces, newlines, etc. eg::

preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#i',$str,$matches); // result is in $matches[1] 

See it in action here.

For your interest, here is a list of different modifiers you can pass in to preg_* functions. Flags that may interest you are:

  • s ("dotall") : this one makes . match every character, including newlines. So, say your <h2>.....</h2> was spread over multiple lines. Then you'd have to do

    preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#is',$str,$matches); 

    in order to have the .* go over multiple lines (see the extra s at the end of the regex?).

  • m ("multiline") : this one just lets ^ and $ match start/end of line instead of just the start/end of string. You only really need it if you're using ^ and $ in your pattern and want them to match the start/end of each individual line in your input.
like image 123
mathematical.coffee Avatar answered Sep 18 '22 03:09

mathematical.coffee