Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for HTML escaping user-supplied data with PHP (and ZF)

Note: I'm using Zend Framework, but I think most of this applies to PHP coding in general.

I'm trying to choose a strategy for writing views scripts, possibly with the help of a templating engine. Motivations: clarity and security. I'm just not happy with writing .phtml scripts. This syntax is awfully verbose to do the most often needed thing - outputting a variable:

<?php echo $this->escape($this->myVariable); ?>

In addition to the code being lengthy, IMHO the template author shouldn't have to remember (and bother) writing an escape call each time he/she wants to output a variable. Forgetting the call will almost definitely result in an XSS vulnerability.

I have two possible solutions for this problem:

Solution 1: A template engine with automatic escaping

I think at least Smarty has an option for automatically escaping html entities when outputting variables. There are points against Smarty, but maybe at least some of them are addressed in the upcoming 3.0 - I haven't checked yet.

XML based template engines like PHPTAL will also escape any data by default. They might look quite odd for a beginner, though. Maybe still worth trying?

Solution 2: Escape the data in the Model

Of course, the other option would be to escape the needed data already in the Model (or even the controller?). The Model should already know the content-type (mainly plain text or HTML text) of each field, so it would be kind of logical to escape the data there. The view could consider all data as safe HTML. This would allow eg. changing the datatype of a field from plain text to HTML without touching the view script - only by changing the Model.

But then again, it doesn't feel like good MVC practice. In addition, there are problems with this approach as well:

  • sometimes the view only wants to print the first n characters, and we don't want to end up truncating the data foo & bar as foo &am (having first escaped it as foo &amp; bar)
  • maybe the view wants to construct an URL with varName=$varName in the querystring - again, escaping already in the Model would be bad.

(These problems could be addressed by providing two versions of the data, or unescaping in the template. Seems bad to me.)

Ideas? Am I missing something? What do you consider "the best practice"?

PS. This post is about finding a general solution for any user-supplied plain-text data that may contain < or > or any other characters. So, filtering data before saving it to the database isn't the solution.

Update:

Thanks for all comments so far. I did some more research and will next evaluate Twig and possibly Open Power Template. Both seem interesting: Twig looks very straightforward, but the project is young. On the XML side, OPT's syntax looks a bit nicer than PHPTAL's. Both Twig and OPT are quite well documented.

like image 867
tuomassalo Avatar asked Nov 18 '09 17:11

tuomassalo


3 Answers

  1. Filter as soon as possible. You should ensure that all text input is proper UTF-8, to make your text manipulation functions work predictably.

    But don't try to filter out "dangerous" characters or fragments! That doesn't work. Only fix or reject incorrect data on input. There's nothing incorrect in < or ' characters.

  2. Escape as late as possible. Add SQL escaping in your SQL query function (or better – use prepared statements). HTML-escape in your HTML templates. Quoted-Printable-escape in your e-mail generation functions, shell-escape when running CLI commands, etc.

    Don't let escaped data spread all over your application, because the longer escaped data lives, the bigger chance you'll mix it up with unescaped data or break escaping during processing.

like image 116
Kornel Avatar answered Sep 26 '22 19:09

Kornel


But then again, it doesn't feel like good MVC practice.

Totally agree, the Model's the wrong place for such presentation concerns and storing both an HTML and a raw version of every variable would make it easy for them to get out of sync. Forget solution 2.

That leaves you with alternative templating engines, or sticking with PHP and learning to bear the load of calling htmlspecialchars all the time. I'm open to the idea of alternative templating entries, but the ones I've tried so far I haven't really been happy with.

(Many discard PHP syntax and implement their own limited expression languages, which means you lose the advantage of the language you already know and are stuck with a noddy-language which makes more complex presentation logic impossible, so you end up doing it yourself in PHP with strings full of HTML, which is absolutely not a win.)

So for the moment I'd suggest a Solution 0a to add to the pile: define a global function with a short name to take the pain out of HTML-escaping:

<?php
    function h($s) {
        echo(htmlspecialchars($s, ENT_QUOTES));
    }
?>
...

My lovely variable is <?php h($this->myVariable); ?>.

I've no idea why PHP doesn't define a shortcut for this, which is as you say by far the most common use case. Now they've dumped the short-tags for XML-PI-style tags, why isn't there one with another name to do the right thing, like say <?phph?

like image 25
bobince Avatar answered Sep 24 '22 19:09

bobince


This isn't a total solution, but one extremely helpful thing in this sort of situation is hungarian-style notation. Hungarian notation used all the time is just annoying, to me, but this is the kind of place where that sort of metadata in the variable name is very valuable. A good practice is to name your variables with a prefix that says what to expect from it...i.e. $rawUserInput, $escapedUserInput, etc.

This doesn't totally solve the problem, but it's a good coding practice. Then when you see a snippet of code that says

'SELECT * from table where username = ' + $rawUserName

it's immediately obvious that there's an injection vulnerability, because you know the raw prefix means you haven't escaped it.

like image 20
Brian Schroth Avatar answered Sep 26 '22 19:09

Brian Schroth