I have a strange problem when using FILTER_SANITIZE_STRING
on a variable (populated by human input). It seems to strip the <
character and any text that comes after that. The >
character is left untouched.
I assume it thinks the <
is an HTML tag that needs to be stripped, however there is no closing tag behind it, so I haven't got a clue why it would behave like that. Is there a way to make it leave the <
in place, and still sanitize the way it should?
The root issue is that when you use FILTER_SANITIZE_STRING
to strip HTML tags you are handling your input as HTML. According to your description, your input is plain text. As such, the filter can only corrupt the input data, as users have already reported.
While it seems to be quite a popular technique, I've never understood the concept of striping HTML tags on plain text as sanitization method. If it isn't HTML you don't need to care about HTML tags, for the same reason that you don't need to care about SQL keywords or command line commands. It's nothing but data.
But, of course, when you inject your string into HTML afterwards you need to escape it in order to ensure that:
That's why htmlspecialchars() exists. Similarly, you need to use the corresponding escape mechanism when you dynamically generate any other kind of code: SQL, JavaScript, JSON...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With