Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to Search+Replace href="URL"

I'm useless with regular expressions and haven't been able to google myself a clear solution to this one.

I want to search+replace some text ($content) for any url inside the anchor's href with a new url (stored as the variable $newurl).

Change this:

<a href="http://blogurl.com/files/foobar.jpg"><img alt="foobar" src="http://blogurl.com/files/2011/03/foobar_thumb.jpg" /></a>

To this:

<a href="http://newurl.com/here/"><img alt="foobar" src="http://blogurl.com/files/2011/03/foobar_thumb.jpg" /></a>

I imagine using preg_replace would be best for this. Something like:

preg_replace('Look for href="any-url"', 
'href="$newurl"',$content);

The idea is to get all images on a WordPress front page to link to their posts instead of to full sized images (which is how they default). Usually there would be only one url to replace, but I don't think it would hurt to replace all potential matches.

Hope all that made sense and thanks in advance!

like image 443
boopboopbeep Avatar asked Mar 05 '11 17:03

boopboopbeep


2 Answers

Here is the gist of what I came up with. Hopefully it helps someone:

$content = get_the_content();
$pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";
$newurl = get_permalink();
$content = preg_replace($pattern,$newurl,$content);

echo $content;

Mucho thanko to @WiseGuyEh

like image 79
boopboopbeep Avatar answered Oct 29 '22 15:10

boopboopbeep


This should do the trick- you can test it here

(?<=href=("|'))[^"']+(?=("|'))

It uses lookahead and lookbehind to assert that anything it matches starts with href=" or href=' and makes sure that it ends with a single or double quote.

Note: the regex will not be able to determine if this is a valid html document- if there is a mix of single then double quotes used to enclose a href value, it will ignore this error!

like image 45
WiseGuyEh Avatar answered Oct 29 '22 15:10

WiseGuyEh