I have a PHP web applications. I do NOT want to allow users to post HTML to my site.
If I simply run strip_tags
() on all data prior to saving into my database, will strip_tags
() be enough to prevent XSS?
I ask because it's unclear to me from reading the documentation of strip_tags if XSS is prevented. There seems to be some bug with browser allowing <0/script>
(yes, a zero) as valid HTML.
UPDATE
I realize that I can simply run htmlspecialchars
on all outputted data; however, my thought is that - since I don't want to allow HTML in the first place, it's simply easier (and academically better) to clean my data once and for all, before saving in my database, then have to worry every time I output the data if the data is safe or not.
Strip tags is perfectly safe - if all that you are doing is outputting the text to the html body.
Encoding is probably the most important line of XSS defense, but it is not sufficient to prevent XSS vulnerabilities in every context. You should also validate input as strictly as possible at the point when it is first received from a user.
Escaping is the primary means to avoid cross-site scripting attacks. When escaping, you are effectively telling the web browser that the data you are sending should be treated as data and should not be interpreted in any other way.
It's not even safe in HTML! strip_tags() is not enough to protect values in attributes, e.g., <input value="$foo"> might be exploited with $foo = " onfocus="evil() (no < , > needed!)
I strongly disagree it's "academically better".
It breaks user input (imagine how useless StackOverflow would be for this discussion if they "cleaned" posts from all tags).
Text inserted in HTML with only tags stripped will be invalid. HTML requires &
to be escaped as well.
It's not even safe in HTML! strip_tags()
is not enough to protect values in attributes, e.g., <input value="$foo">
might be exploited with $foo
= " onfocus="evil()
(no <
,>
needed!)
So the correct solution is to escape data according to requirements of language you're generating. When you have plain text and you're generating HTML, you should convert text to HTML with htmlspecialchars()
or such. When you're generating e-mail, you should convert text to quoted-printable format, and so on.
strip_tags
itself is not going to be sufficient as it removes perfectly valid, non-HTML content. For instance:
<?php
echo strip_tags("This could be a happy clown *<:) or a puckered face.\n");
....
echo strip_tags("Hey guys <--- look at this!\n");
Will output:
This could be a happy clown *
And:
Hey guys
Everything after the initial <
gets removed. Very annoying for end users! Disallowing reserved HTML characters would be a bad move. And these characters will need to be escaped with htmlentities
or a similar function when used inline with HTML.
You need something more advanced that strip_tags
- HTML Purifier works great and will allow users to use HTML reserved characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With