Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Robust approach for building SQL queries programmatically

I have to resort to raw SQL where the ORM is falling short (using Django 1.7). The problem is that most of the queries end up being 80-90% similar. I cannot figure out a robust & secure way to build queries without violating re-usability.

Is string concatenation the only way out, i.e. build parameter-less query strings using if-else conditions, then safely include the parameters using prepared statements (to avoid SQL injection). I want to follow a simple approach for templating SQL for my project instead of re-inventing a mini ORM.

For example, consider this query:

SELECT id, name, team, rank_score
FROM
  ( SELECT id, name, team
    ROW_NUMBER() OVER (PARTITION BY team
                       ORDER BY count_score DESC) AS rank_score
    FROM 
      (SELECT id, name, team
       COUNT(score) AS count_score
       FROM people
       INNER JOIN scores on (scores.people_id = people.id)
       GROUP BY id, name, team
      ) AS count_table
  ) AS rank_table
WHERE rank_score < 3

How can I:

a) add optional WHERE constraint on people or
b) change INNER JOIN to LEFT OUTER or
c) change COUNT to SUM or
d) completely skip the OVER / PARTITION clause?

like image 588
click Avatar asked Aug 07 '14 15:08

click


1 Answers

Better query

For starters you can fix the syntax, simplify and clarify quite a bit:

SELECT *
FROM  (
   SELECT p.person_id, p.name, p.team, sum(s.score)::int AS score
         ,rank() OVER (PARTITION BY p.team
                       ORDER BY sum(s.score) DESC)::int AS rnk
    FROM  person p
    JOIN  score  s USING (person_id)
    GROUP BY 1
   ) sub
WHERE  rnk < 3;
  • Building on my updated table layout. See fiddle below.

  • You do not need the additional subquery. Window functions are executed after aggregate functions, so you can nest it like demonstrated.

  • While talking about "rank", you probably want to use rank(), not row_number().

  • Assuming people.people_id is the PK, you can simplify GROUP BY.

  • Be sure to table-qualify all column names that might be ambiguous

PL/pgSQL function

Then I would write a plpgsql function that takes parameters for your variable parts. Implementing a - c of your points. d is unclear, leaving that for you to add.

CREATE OR REPLACE FUNCTION f_demo(_agg text       DEFAULT 'sum'
                               , _left_join bool  DEFAULT FALSE
                               , _where_name text DEFAULT NULL)
  RETURNS TABLE(person_id int, name text, team text, score int, rnk int) AS
$func$
DECLARE
   _agg_op  CONSTANT text[] := '{count, sum, avg}';  -- allowed functions
   _sql     text;
BEGIN

-- assert --
IF _agg ILIKE ANY (_agg_op) THEN
   -- all good
ELSE
   RAISE EXCEPTION '_agg must be one of %', _agg_op;
END IF;

-- query --
_sql := format('
SELECT *
FROM  (
   SELECT p.person_id, p.name, p.team, %1$s(s.score)::int AS score
         ,rank() OVER (PARTITION BY p.team
                       ORDER BY %1$s(s.score) DESC)::int AS rnk
    FROM  person p
    %2$s  score  s USING (person_id)
    %3$s
    GROUP BY 1
   ) sub
WHERE  rnk < 3
ORDER  BY team, rnk'
   , _agg
   , CASE WHEN _left_join THEN 'LEFT JOIN' ELSE 'JOIN' END
   , CASE WHEN _where_name <> '' THEN 'WHERE p.name LIKE $1' ELSE '' END
);

-- debug   -- quote when tested ok
-- RAISE NOTICE '%', _sql;

-- execute -- unquote when tested ok
RETURN QUERY EXECUTE _sql
USING  _where_name;   -- $1

END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_demo();
SELECT * FROM f_demo('sum', TRUE, '%2');    
SELECT * FROM f_demo('avg', FALSE);
SELECT * FROM f_demo(_where_name := '%1_'); -- named param

SQL Fiddle

  • You need a firm understanding of PL/pgSQL. Else, there is just too much to explain. You'll find related answers here on SO under plpgsql for practically every detail in the answer.

  • All parameters are treated safely, no SQL injection possible. More:

    • Define table and column names as arguments in a plpgsql function?
    • Table name as a PostgreSQL function parameter
  • Note in particular, how a WHERE clause is added conditionally (when _where_name is passed) with the positional parameter $1 in the query sting. The value is passed to EXECUTE as value with the USING clause. No type conversion, no escaping, no chance for SQL injection. Examples:

    • Row expansion via "*" is not supported here
    • SQL state: 42601 syntax error at or near "11"
    • Refactor a PL/pgSQL function to return the output of various SELECT queries
  • Use DEFAULT values for function parameters, so you are free to provide any or none. More:

    • Functions with variable number of input parameters
    • The manual on calling functions
  • The function format() is instrumental for building complex dynamic SQL strings in a safe and clean fashion.

like image 60
Erwin Brandstetter Avatar answered Nov 03 '22 21:11

Erwin Brandstetter