Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expression to remove links [duplicate]

Tags:

html

regex

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I have a HTML page with

<a class="development" href="[variable content]">X</a>

The [variable content] is different in each place, the rest is the same.
What regexp will catch all of those links? (Although I am not writing it here, I did try...)

like image 725
Itay Moav -Malimovka Avatar asked May 04 '09 16:05

Itay Moav -Malimovka


3 Answers

What about the non-greedy version:

<a class="development" href="(.*?)">X</a>
like image 102
vrish88 Avatar answered Oct 17 '22 03:10

vrish88


Try this regular expression:

<a class="development" href="[^"]*">X</a>
like image 45
Gumbo Avatar answered Oct 17 '22 03:10

Gumbo


Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

like image 30
Chas. Owens Avatar answered Oct 17 '22 03:10

Chas. Owens