I'm in need of a regular expression that would allow anything except for HTML tags. The trick here is that < and > characters would be allowed, but just not with text between them (but other characters are fine).
The following would be allowed:
hello world
!@$%^&*()_+'":;[]{}()\|#
<<<<<<<
>>>>>
<>
><
<087>
<-->
The following would not be allowed
<html>
<a>
<foo>
<bar>
I've tried several expressions with no luck. This turned out to be surprisingly harder than it seemed at first (for me anyway :P)
EDIT: Basically, anything is allowed except: A-Z and a-z between < and > characters.
If you are doing this to prevent HTML injection on a website then a much better solution is to just escape HTML special characters before sending them to the browser. Most web development environments/libraries will have a standard function to do this, for example PHP has htmlentities and htmlspecialchars functions.
Shockingly, since you described your use case, it actually sounds like regexen will work here: you need to prevent <SomeTextHere> from showing up without any restrictions on where, and certainly no need to worry about recursion.  The following regex will do the opposite of what you want: <[A-Za-z]+> (changing the + to a * if you can't allow <>).  This will match everywhere such text occurs; I'd recommend putting the logic in the language instead (e.g., if (!/<[A-Za-z]+>/) { do_something() }).  If you need it in the regex, and if your language supports such things, you can use a negative look-ahead assertion: ^(?!.*<[A-Za-z]+>).  This says "match at the beginning of the string (^) if I can't find ((?!...)) the given text—but your matched string will contain no characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With