Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to disallow HTML tags? [duplicate]

Tags:

html

regex

I'm in need of a regular expression that would allow anything except for HTML tags. The trick here is that < and > characters would be allowed, but just not with text between them (but other characters are fine).

The following would be allowed:

hello world
!@$%^&*()_+'":;[]{}()\|#
<<<<<<<
>>>>>
<>
><
<087>
<-->

The following would not be allowed

<html>
<a>
<foo>
<bar>

I've tried several expressions with no luck. This turned out to be surprisingly harder than it seemed at first (for me anyway :P)

EDIT: Basically, anything is allowed except: A-Z and a-z between < and > characters.

like image 238
FiniteLooper Avatar asked Nov 03 '10 22:11

FiniteLooper


2 Answers

If you are doing this to prevent HTML injection on a website then a much better solution is to just escape HTML special characters before sending them to the browser. Most web development environments/libraries will have a standard function to do this, for example PHP has htmlentities and htmlspecialchars functions.

like image 95
Cameron Skinner Avatar answered Oct 30 '22 21:10

Cameron Skinner


Shockingly, since you described your use case, it actually sounds like regexen will work here: you need to prevent <SomeTextHere> from showing up without any restrictions on where, and certainly no need to worry about recursion. The following regex will do the opposite of what you want: <[A-Za-z]+> (changing the + to a * if you can't allow <>). This will match everywhere such text occurs; I'd recommend putting the logic in the language instead (e.g., if (!/<[A-Za-z]+>/) { do_something() }). If you need it in the regex, and if your language supports such things, you can use a negative look-ahead assertion: ^(?!.*<[A-Za-z]+>). This says "match at the beginning of the string (^) if I can't find ((?!...)) the given text—but your matched string will contain no characters.

like image 28
Antal Spector-Zabusky Avatar answered Oct 30 '22 21:10

Antal Spector-Zabusky