I feel a little silly for asking this since I seem to be the only person in the world who doesn't get it, but here goes anyway. I'm going to use Python as an example. When I use raw SQL queries (I usually use ORMs) I use parameterisation, like this example using SQLite:
Method A:
username = "wayne" query_params = (username) cursor.execute("SELECT * FROM mytable WHERE user=?", query_params)
I know this works and I know this is the generally recommended way to do it. A SQL injection-vulnerable way to do the same thing would be something like this:
Method B:
username = "wayne" cursor.execute("SELECT * FROM mytable WHERE user='%s'" % username)
As far I can tell I understand SQL injection, as explained in this Wikipedia article. My question is simply: How is method A really different to method B? Why is the end result of method A not the same as method B? I assume that the cursor.execute()
method (part of Python's DB-API specification) takes care of correctly escaping and type-checking the input, but this is never explicitly stated anywhere. Is that all that parameterisation in this context is? To me, when we say "parameterisation", all that means is "string substitution", like %-formatting. Is that incorrect?
A parameterized query doesn't actually do string replacement. If you use string substitution, then the SQL engine actually sees a query that looks like
SELECT * FROM mytable WHERE user='wayne'
If you use a ?
parameter, then the SQL engine sees a query that looks like
SELECT * FROM mytable WHERE user=<some value>
Which means that before it even sees the string "wayne", it can fully parse the query and understand, generally, what the query does. It sticks "wayne" into its own representation of the query, not the SQL string that describes the query. Thus, SQL injection is impossible, since we've already passed the SQL stage of the process.
(The above is generalized, but it more or less conveys the idea.)
When you do text replacement (like your method B), you have to be wary of quotes and such, because the server will get a single piece of text, and it have to determine where the value ends.
With parameterized statements, OTOH, the DB server gets the statement as is, without the parameter. The value is sent to the server as a different piece of data, using a simple binary safe protocol. Therefore, your program doesn't have to put quotes around the value, and of course it doesn't matter if there were already quotes in the value itself.
An analogy is about source and compiled code: in your method B, you're building the source code of a procedure, so you have to be sure to strictly follow the language syntax. With Method A, you first build and compile a procedure, then (immediately after, in your example), you call that procedure with your value as a parameter. And of course, in-memory values aren't subject to syntax limitations.
Umm... that wasn't really an analogy, it's really what is happening under the hood (roughly).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With