Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Execute a dynamic crosstab query

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/

Here's the function:

create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
    dynsql1 varchar;
    dynsql2 varchar;
    columnlist varchar;
begin
    -- 1. retrieve list of column names.
    dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
    execute dynsql1 into columnlist;
    -- 2. set up the crosstab query
    dynsql2 = 'select * from crosstab (
 ''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
 ''select distinct '||colc||' from '||tablename||' order by 1''
 )
 as ct (
 '||rowc||' varchar,'||columnlist||'
 );';
    return dynsql2;
end
$$;

So now I can call the function:

select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');

Which returns (because the return type of the function is varchar):

select * from crosstab (
   'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
    from globalpayments
    group by 1,2
    order by 1,2'
 , 'select distinct currency
    from globalpayments
    order by 1'
   )  as ct ( month varchar,CAD text,EUR text,GBP text,USD text );

How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:

query result

Edit 1

This function comes close, but I need it to return more than just the first column of the first record

Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?

create or replace function eval( sql  text ) returns text as $$
declare
  as_txt  text;
begin
  if  sql is null  then  return null ;  end if ;
  execute  sql  into  as_txt ;
  return  as_txt ;
end;
$$ language plpgsql

usage: select * from eval($$select * from analytics limit 1$$)

However it just returns the first column of the first record :

eval
----
2015

when the actual result looks like this:

Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000
like image 565
skilbjo Avatar asked Apr 22 '16 22:04

skilbjo


2 Answers

What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.

A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.

  • Refactor a PL/pgSQL function to return the output of various SELECT queries

You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.

There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:

  • Dynamic alternative to pivot with CASE and GROUP BY

  • Dynamically convert hstore keys into columns for an unknown set of keys

But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.

  • Selecting multiple max() values using a single SQL statement

I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.

A shorter and cleaner implementation could look like this:

CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
                              , _expr text  -- still vulnerable to SQL injection!
                              , _type regtype)
  RETURNS text
  LANGUAGE plpgsql AS
$func$
DECLARE
   _cat_list text;
   _col_list text;
BEGIN

-- generate categories for xtab param and col definition list    
EXECUTE format(
 $$SELECT string_agg(quote_literal(x.cat), '), (')
        , string_agg(quote_ident  (x.cat), %L)
   FROM  (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
 , ' ' || _type || ', ', _cat, _tbl)
INTO  _cat_list, _col_list;

-- generate query string
RETURN format(
'SELECT * FROM crosstab(
   $q$SELECT %I, %I, %s
      FROM   %I
      GROUP  BY 1, 2  -- only works if the 3rd column is an aggregate expression
      ORDER  BY 1, 2$q$
 , $c$VALUES (%5$s)$c$
   ) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr  -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);

END
$func$;

Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:

  • PostgreSQL Crosstab Query

This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.

  • Table name as a PostgreSQL function parameter

However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.

  • SQL injection in Postgres functions vs prepared queries

Scans the table only once for both lists of categories and should be a bit faster.

Still can't return completely dynamic row types since that's strictly not possible.

like image 133
Erwin Brandstetter Avatar answered Sep 30 '22 15:09

Erwin Brandstetter


Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.

Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).

So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.

For example, we could do something like this with a wrapper function but the same logic could be folded into your function:

CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar, 
 cellc varchar, celldatatype varchar) 
returns setof record language plpgsql as $$
    DECLARE outrow record;
    BEGIN
       FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
       LOOP
           RETURN NEXT outrow
       END LOOP;
    END;
 $$;

Then you supply the record structure on calling the function just like you do with crosstab. Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

like image 40
Chris Travers Avatar answered Sep 30 '22 15:09

Chris Travers