Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add attribute to first P tag using PHP regular expression?

Tags:

html

regex

php

WordPress spits posts in this format:

<h2>Some header</h>
<p>First paragraph of the post</p>
<p>Second paragraph of the post</p>
etc.

To get my cool styling on the first paragraph (it's one of those things that looks good only sparingly) I need to hook into the get_posts function to filter its output with a preg_replace.

The goal is to get the above code to look like:

<h2>Some header</h>
<p class="first">First paragraph of the post</p>
<p>Second paragraph of the post</p>

I have this so far but it's not even working (the error is: "preg_replace() [function.preg-replace]: Unknown modifier ']'")

$output=preg_replace('<p[^>]*>', '<p class="first">', $content);

I can't use CSS3 meta-selectors because I need to support IE6, and I can't apply the :first-line meta-selector (this is one that IE6 supports) on the parent container because it would hit the H2 instead of the first P.

like image 384
Rod Boev Avatar asked Apr 02 '26 03:04

Rod Boev


2 Answers

You may find it easier and more reliable to use an HTML parser such as this one. HTML is notoriously difficult to parse reliably (technically, impossible) with regular expressions, and the parser will give you a very simple means to find the nodes you're interested in. The first page of the doc has a tab labelled "How to modify HTML elements".

like image 194
Brian Agnew Avatar answered Apr 03 '26 17:04

Brian Agnew


Two right possibilities :

  1. Do that in Javascript. Using jQuery, for example, it's a matter of one line : $("h2").next().addClass("first")
  2. Use an HTML parser. Indeed, regexp are not a good tool to do what you want to do. Since loading a whole HTML parser for just this purpose is overkill, you'd really better be using Javascript.

The wrong way

Of course, in order to anwser the question, here is the best way I can't think of to make it happends with regexp. Though, I don't recommend it.

preg_replace('#(</h2>\s*<p[^>]*)>#im', '$1 class="first">', '<h2>Some header</h> <p>First paragraph of the post</p> <p>Second paragraph of the post</p> ');

What we do is:

  • using preg_replace so we can use advanced regexp to replace the code;
  • using "m" and "i" flag so the regexp does not bother about line break or case;
  • using </h2>\s* to match the closing "h2" tags and all the spaces/line breaks after;
  • using *<p[^>]* to match the "p" tag and its current attributs;
  • using parenthesis to save that;
  • using "$1" to replace to replace the matched string we the part we save;
  • adding the class and closing the ">".

The first draw back I can think of is that it doesn't handle the case where a class already exists.

Of, and by the way, you have <h2>...</h> instead of <h2>...</h2>. I don't know if it's a typo but I assumed it was. Replace in the regexp accordingly if it's not.

like image 40
e-satis Avatar answered Apr 03 '26 17:04

e-satis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!