I want to pass a table name as a parameter in a Postgres function. I tried this code: <pre class="prettyprint"><code>CREATE OR REPLACE FUNCTION some_f(param character varying) RETURNS integer AS $$ BEGIN IF EXISTS (select * from quote_ident($1) where quote_ident($1).id=1) THEN return 1; END IF; return 0; END; $$ LANGUAGE plpgsql; select some_f('table_name'); </code></pre> And I got this: <pre class="prettyprint"><code>ERROR: syntax error at or near "." LINE 4: ...elect * from quote_ident($1) where quote_ident($1).id=1)... ^ ********** Error ********** ERROR: syntax error at or near "." </code></pre> And here is the error I got when changed to this <code>select * from quote_ident($1) tab where tab.id=1</code>: <pre class="prettyprint"><code>ERROR: column tab.id does not exist LINE 1: ...T EXISTS (select * from quote_ident($1) tab where tab.id... </code></pre> Probably, <code>quote_ident($1)</code> works, because without the <code>where quote_ident($1).id=1</code> part I get <code>1</code>, which means something is selected. Why may the first <code>quote_ident($1)</code> work and the second one not at the same time? And how could this be solved?

This can be further simplified and improved: <pre class="prettyprint"><code>CREATE OR REPLACE FUNCTION some_f(_tbl regclass, OUT result integer) LANGUAGE plpgsql AS $func$ BEGIN EXECUTE format('SELECT (EXISTS (SELECT FROM %s WHERE id = 1))::int', _tbl) INTO result; END $func$; </code></pre> Call with schema-qualified name (see below): <pre class="prettyprint"><code>SELECT some_f('myschema.mytable'); -- would fail with quote_ident() </code></pre> Or: <pre class="prettyprint"><code>SELECT some_f('"my very uncommon table name"'); </code></pre> <h3>Major points</h3> Use an <code>OUT</code> parameter to simplify the function. You can directly select the result of the dynamic SQL into it and be done. No need for additional variables and code. <code>EXISTS</code> does exactly what you want. You get <code>true</code> if the row exists or <code>false</code> otherwise. There are various ways to do this, <code>EXISTS</code> is typically most efficient. You seem to want an integer back, so I cast the <code>boolean</code> result from <code>EXISTS</code> to <code>integer</code>, which yields exactly what you had. I would return boolean instead. I use the object identifier type <code>regclass</code> as input type for <code>_tbl</code>. That does everything <code>quote_ident(_tbl)</code> or <code>format('%I', _tbl)</code> would do, but better, because: <ul> <li> .. it prevents SQL injection just as well. </li> <li> .. it fails immediately and more gracefully if the table name is invalid / does not exist / is invisible to the current user. (A <code>regclass</code> parameter is only applicable for existing tables.) </li> <li> .. it works with schema-qualified table names, where a plain <code>quote_ident(_tbl)</code> or <code>format(%I)</code> would fail because they cannot resolve the ambiguity. You would have to pass and escape schema and table names separately. </li> </ul> It only works for existing tables, obviously. I still use <code>format()</code>, because it simplifies the syntax (and to demonstrate how it's used), but with <code>%s</code> instead of <code>%I</code>. Typically, queries are more complex so <code>format()</code> helps more. For the simple example we could as well just concatenate: <pre class="prettyprint"><code>EXECUTE 'SELECT (EXISTS (SELECT FROM ' || _tbl || ' WHERE id = 1))::int' </code></pre> No need to table-qualify the <code>id</code> column while there is only a single table in the <code>FROM</code> list. No ambiguity possible in this example. (Dynamic) SQL commands inside <code>EXECUTE</code> have a separate scope, function variables or parameters are not visible there - as opposed to plain SQL commands in the function body. Here's why you always escape user input for dynamic SQL properly: db<>fiddle here demonstrating SQL injection Old sqlfiddle

If at all possible, don't do this. That's the answer—it's an anti-pattern. If the client knows the table it wants data from, then <code>SELECT FROM ThatTable</code>. If a database is designed in a way that this is required, it seems to be designed sub-optimally. If a data access layer needs to know whether a value exists in a table, it is easy to compose SQL in that code, and pushing this code into the database is not good. To me this seems like installing a device inside an elevator where one can type in the number of the desired floor. After the Go button is pressed, it moves a mechanical hand over to the correct button for the desired floor and presses it. This introduces many potential issues. Please note: there is no intention of mockery, here. My silly elevator example was *the very best device I could imagine* for succinctly pointing out issues with this technique. It adds a useless layer of indirection, moving table name choice from a caller space (using a robust and well-understood DSL, SQL) into a hybrid using obscure/bizarre server-side SQL code. Such responsibility-splitting through movement of query construction logic into dynamic SQL makes the code harder to understand. It violates a standard and reliable convention (how a SQL query chooses what to select) in the name of custom code fraught with potential for error. Here are detailed points on some of the potential problems with this approach: <ul> <li>Dynamic SQL offers the possibility of SQL injection that is hard to recognize in the front end code or the back end code alone (one must inspect them together to see this).</li> <li>Stored procedures and functions can access resources that the SP/function owner has rights to but the caller doesn't. As far as I understand, without special care, then by default when you use code that produces dynamic SQL and runs it, the database executes the dynamic SQL under the rights of the caller. This means you either won't be able to use privileged objects at all, or you have to open them up to all clients, increasing the surface area of potential attack to privileged data. Setting the SP/function at creation time to always run as a particular user (in SQL Server, <code>EXECUTE AS</code>) may solve that problem, but makes things more complicated. This exacerbates the risk of SQL injection mentioned in the previous point, by making the dynamic SQL a very enticing attack vector.</li> <li>When a developer must understand what the application code is doing in order to modify it or fix a bug, he'll find it very difficult to get the exact SQL query being executed. SQL profiler can be used, but this takes special privileges and can have negative performance effects on production systems. The executed query can be logged by the SP but this increases complexity for questionable benefit (requiring accommodating new tables, purging old data, etc.) and is quite non-obvious. In fact, some applications are architected such that the developer does not have database credentials, so it becomes almost impossible for him to actually see the query being submitted.</li> <li>When an error occurs, such as when you try to select a table that doesn't exist, you'll get a message along the lines of "invalid object name" from the database. That will happen exactly the same whether you're composing the SQL in the back end or the database, but the difference is, some poor developer who's trying to troubleshoot the system has to spelunk one level deeper into yet another cave below the one where the problem exists, to dig into the wonder-procedure that Does It All to try to figure out what the problem is. Logs won't show "Error in GetWidget", it will show "Error in OneProcedureToRuleThemAllRunner". This abstraction will generally make a system worse.</li> </ul> An example in pseudo-C# of switching table names based on a parameter: <pre class="prettyprint"><code>string sql = $"SELECT * FROM {EscapeSqlIdentifier(tableName)};" results = connection.Execute(sql); </code></pre> While this does not eliminate every possible issue imaginable, the flaws I outlined with the other technique are absent from this example.

Table name as a PostgreSQL function parameter

Tags:

function

postgresql

identifier

dynamic-sql

plpgsql

I want to pass a table name as a parameter in a Postgres function. I tried this code:

CREATE OR REPLACE FUNCTION some_f(param character varying) RETURNS integer  AS $$     BEGIN     IF EXISTS (select * from quote_ident($1) where quote_ident($1).id=1) THEN      return 1;     END IF;     return 0;     END; $$ LANGUAGE plpgsql;  select some_f('table_name');

And I got this:

ERROR:  syntax error at or near "." LINE 4: ...elect * from quote_ident($1) where quote_ident($1).id=1)...                                                              ^  ********** Error **********  ERROR: syntax error at or near "."

And here is the error I got when changed to this select * from quote_ident($1) tab where tab.id=1:

ERROR:  column tab.id does not exist LINE 1: ...T EXISTS (select * from quote_ident($1) tab where tab.id...

Probably, quote_ident($1) works, because without the where quote_ident($1).id=1 part I get 1, which means something is selected. Why may the first quote_ident($1) work and the second one not at the same time? And how could this be solved?

890

asked May 22 '12 15:05

John Doe

2 Answers

This can be further simplified and improved:

CREATE OR REPLACE FUNCTION some_f(_tbl regclass, OUT result integer)     LANGUAGE plpgsql AS $func$ BEGIN    EXECUTE format('SELECT (EXISTS (SELECT FROM %s WHERE id = 1))::int', _tbl)    INTO result; END $func$;

Call with schema-qualified name (see below):

SELECT some_f('myschema.mytable');  -- would fail with quote_ident()

Or:

SELECT some_f('"my very uncommon table name"');

Major points

Use an OUT parameter to simplify the function. You can directly select the result of the dynamic SQL into it and be done. No need for additional variables and code.

EXISTS does exactly what you want. You get true if the row exists or false otherwise. There are various ways to do this, EXISTS is typically most efficient.

You seem to want an integer back, so I cast the boolean result from EXISTS to integer, which yields exactly what you had. I would return boolean instead.

I use the object identifier type regclass as input type for _tbl. That does everything quote_ident(_tbl) or format('%I', _tbl) would do, but better, because:

.. it prevents SQL injection just as well.
.. it fails immediately and more gracefully if the table name is invalid / does not exist / is invisible to the current user. (A regclass parameter is only applicable for existing tables.)
.. it works with schema-qualified table names, where a plain quote_ident(_tbl) or format(%I) would fail because they cannot resolve the ambiguity. You would have to pass and escape schema and table names separately.

It only works for existing tables, obviously.

I still use format(), because it simplifies the syntax (and to demonstrate how it's used), but with %s instead of %I. Typically, queries are more complex so format() helps more. For the simple example we could as well just concatenate:

EXECUTE 'SELECT (EXISTS (SELECT FROM ' || _tbl || ' WHERE id = 1))::int'

No need to table-qualify the id column while there is only a single table in the FROM list. No ambiguity possible in this example. (Dynamic) SQL commands inside EXECUTE have a separate scope, function variables or parameters are not visible there - as opposed to plain SQL commands in the function body.

Here's why you always escape user input for dynamic SQL properly:

db<>fiddle here demonstrating SQL injection
_{Old sqlfiddle}

answered Sep 22 '22 01:09

Erwin Brandstetter

If at all possible, don't do this.

That's the answer—it's an anti-pattern. If the client knows the table it wants data from, then SELECT FROM ThatTable. If a database is designed in a way that this is required, it seems to be designed sub-optimally. If a data access layer needs to know whether a value exists in a table, it is easy to compose SQL in that code, and pushing this code into the database is not good.

To me this seems like installing a device inside an elevator where one can type in the number of the desired floor. After the Go button is pressed, it moves a mechanical hand over to the correct button for the desired floor and presses it. This introduces many potential issues.

Please note: there is no intention of mockery, here. My silly elevator example was *the very best device I could imagine* for succinctly pointing out issues with this technique. It adds a useless layer of indirection, moving table name choice from a caller space (using a robust and well-understood DSL, SQL) into a hybrid using obscure/bizarre server-side SQL code.

Such responsibility-splitting through movement of query construction logic into dynamic SQL makes the code harder to understand. It violates a standard and reliable convention (how a SQL query chooses what to select) in the name of custom code fraught with potential for error.

Here are detailed points on some of the potential problems with this approach:

Dynamic SQL offers the possibility of SQL injection that is hard to recognize in the front end code or the back end code alone (one must inspect them together to see this).
Stored procedures and functions can access resources that the SP/function owner has rights to but the caller doesn't. As far as I understand, without special care, then by default when you use code that produces dynamic SQL and runs it, the database executes the dynamic SQL under the rights of the caller. This means you either won't be able to use privileged objects at all, or you have to open them up to all clients, increasing the surface area of potential attack to privileged data. Setting the SP/function at creation time to always run as a particular user (in SQL Server, EXECUTE AS) may solve that problem, but makes things more complicated. This exacerbates the risk of SQL injection mentioned in the previous point, by making the dynamic SQL a very enticing attack vector.
When a developer must understand what the application code is doing in order to modify it or fix a bug, he'll find it very difficult to get the exact SQL query being executed. SQL profiler can be used, but this takes special privileges and can have negative performance effects on production systems. The executed query can be logged by the SP but this increases complexity for questionable benefit (requiring accommodating new tables, purging old data, etc.) and is quite non-obvious. In fact, some applications are architected such that the developer does not have database credentials, so it becomes almost impossible for him to actually see the query being submitted.
When an error occurs, such as when you try to select a table that doesn't exist, you'll get a message along the lines of "invalid object name" from the database. That will happen exactly the same whether you're composing the SQL in the back end or the database, but the difference is, some poor developer who's trying to troubleshoot the system has to spelunk one level deeper into yet another cave below the one where the problem exists, to dig into the wonder-procedure that Does It All to try to figure out what the problem is. Logs won't show "Error in GetWidget", it will show "Error in OneProcedureToRuleThemAllRunner". This abstraction will generally make a system worse.

An example in pseudo-C# of switching table names based on a parameter:

string sql = $"SELECT * FROM {EscapeSqlIdentifier(tableName)};" results = connection.Execute(sql);

While this does not eliminate every possible issue imaginable, the flaws I outlined with the other technique are absent from this example.

answered Sep 24 '22 01:09

ErikE

Related questions
                            
                                "Life-time" of a string literal in C
                            
                                Is it possible to run function in a subprocess without threading or writing a separate file/script.
                            
                                How to pass a vector to a function?
                            
                                Functions vs Stored Procedures
                            
                                Giving my function access to outside variable
                            
                                Javascript: Call a function after specific time period
                            
                                Javascript - Apply trim function to each string in an array
                            
                                What is the difference between a language construct and a "built-in" function in PHP?
                            
                                Is there any native DLL export functions viewer? [duplicate]
                            
                                How does the Math.max.apply() work?
                            
                                Is there a way to write a Bash function which aborts the whole execution, no matter how it is called?
                            
                                Calling a javascript function recursively
                            
                                Function with same name but different signature in derived class
                            
                                How to get an object's methods?
                            
                                PHP: Call to undefined function: simplexml_load_string()
                            
                                Get a list/tuple/dict of the arguments passed to a function?
                            
                                Define default values for function arguments
                            
                                PHP call_user_func vs. just calling function
                            
                                Overloaded functions in Python
                            
                                Function chaining in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With