Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FILTER_SANITIZE_STRING is stripping the < character and any text after it

Tags:

php

I have a strange problem when using FILTER_SANITIZE_STRING on a variable (populated by human input). It seems to strip the < character and any text that comes after that. The > character is left untouched.

I assume it thinks the < is an HTML tag that needs to be stripped, however there is no closing tag behind it, so I haven't got a clue why it would behave like that. Is there a way to make it leave the < in place, and still sanitize the way it should?

like image 874
Sempiterna Avatar asked Apr 13 '13 14:04

Sempiterna


1 Answers

The root issue is that when you use FILTER_SANITIZE_STRING to strip HTML tags you are handling your input as HTML. According to your description, your input is plain text. As such, the filter can only corrupt the input data, as users have already reported.

While it seems to be quite a popular technique, I've never understood the concept of striping HTML tags on plain text as sanitization method. If it isn't HTML you don't need to care about HTML tags, for the same reason that you don't need to care about SQL keywords or command line commands. It's nothing but data.

But, of course, when you inject your string into HTML afterwards you need to escape it in order to ensure that:

  1. Your data is displayed as-is
  2. The result is still valid HTML

That's why htmlspecialchars() exists. Similarly, you need to use the corresponding escape mechanism when you dynamically generate any other kind of code: SQL, JavaScript, JSON...

like image 68
Álvaro González Avatar answered Oct 04 '22 01:10

Álvaro González