Escaping only what is necessary, is that possible?

Question

I am working with a team of developers on a website. The website will be using classes. I am in charge of creating the data access layer for the classes. There is an understanding that all user input will be escaped upon retrieval (from the post or get). Having little control over the input level (unless I personally review everyone's code), I thought it would be cool to throw in escaping on my end as well (right before it hits the database). The problem is that I don't know how to use mysql_real_escape_string without adding even more slashes.

Since user input may very well contain a slash I can't check to make sure there are slashes in it. I might be able to check for all the things that need escaping and make sure they have a slash in front of them, but that doesn't seem like the best way to do it.

Any suggestions?

Since user input may very well contain a slash I can't check to make sure there are slashes in it. I might be able to check for all the things that need escaping and make sure they have a slash in front of them, but that doesn't seem like the best way to do it.

Any suggestions?

derobert · Accepted Answer

Have you considered not escaping the data until it hits the data access layer? I ask, because their are some gotchas with the approach your team is taking:

If you need to display form data to the user (e.g., to redisplay the form with an error message because some validation failed), you need to de-escape the data (because ' is not special to HTML) and then re-escape the data (because < is special). If you need to display form data to the user pulled from the database, you mustn't do that de-escape step (because it was done by the database, when the data was saved), but still must do the HTML escape step. If you make a mistake and do the wrong procedure, you corrupt data or worse introduce security problems.
You can deal with the different formats from different sources problem by decided all data passed around your app will be escaped. So, your data access layer will re-escape the data upon getting it from the database. But, as different parts of the app need slightly (or completely) different escapes, this quickly leads to a lot of de-escape/re-escape nonsense. Grab the data from the database, escape it, de-escape it, escape it for HTML, output it.
Your front-end form handling code has to have intimate knowledge of your database. For example, what does \' mean to your database? How should a ' or \ be escape — if at all? If you ever change your database engine, or even change its settings, those may change. And then you have a bunch of escaping/de-escaping code to find. Missing a single escape/de-escape may lead to SQL injection.
Alternatively, you can take that knowledge of the database out of the front-end code by having the database layer do a de-escape/escape cycle to convert from your app-standard escape sequence to your database's. But this seems rather silly!

There is another way: Let whichever layer needs the data escaped escape it itself. Data is always passed between layers in raw, unescaped form. So your data access layer does all database escaping. Your HTML output code does all HTML escaping. When you decide you want to generate PDFs, your PDF code does all PDF escaping.

Now, when you do form output, its clear what to do: always HTML escape the data. No matter where it came from. Never run a de-escape.
There is now no de-escape/escape nonsense, as everything is passed around raw. It is only escaped when necessary.
Your front-end code doesn't care about the data access layer's implementation. The data access layer stores and returns any arbitrary string.
You have only one place to look in your app to make sure you have no SQL injection problems.
You can easily make use of database driver features such as placeholders. Then not even your data access layer needs to be aware of each database's escaping requirements; the database driver handles it.

Fredrik · Answer

There is no way you could add an automatic decision to escape or not if you don't know if the input has been escaped. You can attempt to analyze it but it will never be good and you will encounter double backslash pairs and such.

Take the decision once that data sent to your access layer should be clean and handle the escaping in one place. If you do, the other developers will not have to worry about it (they probably don't want to anyway) and it will be much easier to move to another database in the future. It will also give you the freedom to move over to prepared statements at any time.

Edit: Forgot this:

Having little control over the input level (unless I personally review everyone's code)

I think it is worth the pain to have them discover it themselves if you just make it very clear that escaping is something that belongs to the database layer and should not be done elsewhere.

Escaping only what is necessary, is that possible?

Tags:

php

mysql

Justin Giboney

2 Answers

derobert

Fredrik

Recent Activity

Donate For Us

Escaping only what is necessary, is that possible?

Tags:

php

mysql

Justin Giboney

2 Answers

derobert

Fredrik

Related questions

Recent Activity

Donate For Us