Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip all HTML tags except links

I am trying to write a regular expression to strip all HTML with the exception of links (the <a href and </a> tags respectively. It does not have to be 100% secure (I am not worried about injection attacks or anything as I am parsing content that has already been approved and published into a SWF movie).

The original "strip tags" regular expression I'm using was <(.|\n)+?>, and I tried to modify it to <([^a]|\n)+?>, but that of course will allow any tag that has an a in it rather than one that has it in the beginning, with a space.

Not that it should really matter, but in case anyone cares to know I am writing this in ActionScript 3.0 for a Flash movie.

like image 688
Jeff Winkworth Avatar asked Sep 04 '08 16:09

Jeff Winkworth


People also ask

Is it possible to remove the HTML tags from data?

Strip_tags() is a function that allows you to strip out all HTML and PHP tags from a given string (parameter one), however you can also use parameter two to specify a list of HTML tags you want.

How do I remove all tags from a string?

The HTML tags can be removed from a given string by using replaceAll() method of String class.

How do you trim a tag in HTML?

The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter. Note: This function is binary-safe.

What does it mean to strip HTML?

stripHtml( html ) Changes the provided HTML string into a plain text string by converting <br> , <p> , and <div> to line breaks, stripping all other tags, and converting escaped characters into their display values.


1 Answers

<(?!\/?a(?=>|\s.*>))\/?.*?> 

Try this. Had something similar for p tags. Worked for them so don't see why not. Uses negative lookahead to check that it doesn't match a (prefixed with an optional / character) where (using positive lookahead) a (with optional / prefix) is followed by a > or a space, stuff and then >. This then matches up until the next > character. Put this in a subst with

s/<(?!\/?a(?=>|\s.*>))\/?.*?>//g; 

This should leave only the opening and closing a tags

like image 131
Xetius Avatar answered Sep 20 '22 06:09

Xetius