Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use SELECT DISTINCT with RANDOM() function in PostgreSQL?

I am trying to run a SQL query to get four random items. As the table product_filter has more than one touple in product i have to use DISTINCT in SELECT, so i get this error:

for SELECT DISTINCT, ORDER BY expressions must appear in select list

But if i put RANDOM() in my SELECT it will avoid the DISTINCT result.

Someone know how to use DISTINCT with the RANDOM() function? Below is my problematic query.

SELECT DISTINCT
    p.id, 
    p.title
FROM
    product_filter pf
    JOIN product p ON pf.cod_product = p.cod
    JOIN filters f ON pf.cod_filter = f.cod
WHERE
    p.visible = TRUE
LIMIT 4
ORDER BY RANDOM();
like image 300
Marcio Mazzucato Avatar asked Jul 09 '12 18:07

Marcio Mazzucato


People also ask

How do I SELECT distinct in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.

Can you use SELECT distinct with multiple columns?

In SQL multiple fields may also be added with DISTINCT clause. DISTINCT will eliminate those rows where all the selected fields are identical.

Can we use distinct without GROUP BY?

There's no reason why GROUP BY shouldn't work here (using DISTINCT is a more concise solution, but it won't solve your problem).

Can we use distinct in SELECT query?

The SELECT DISTINCT statement is used to return only distinct (different) values. Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.


2 Answers

You either do a subquery

SELECT * FROM (
    SELECT DISTINCT p.cod, p.title ... JOIN... WHERE
    ) ORDER BY RANDOM() LIMIT 4;

or you try GROUPing for those same fields:

SELECT p.cod, p.title, MIN(RANDOM()) AS o FROM ... JOIN ...
    WHERE ... GROUP BY p.cod, p.title ORDER BY o LIMIT 4;

Which of the two expressions will evaluate faster depends on table structure and indexing; with proper indexing on cod and title, the subquery version will run faster (cod and title will be taken from index cardinality information, and cod is the only key needed for the JOIN, so if you index by title, cod and visible (used in the WHERE), it is likely that the physical table will not even be accessed at all.

I am not so sure whether this would happen with the second expression too.

like image 96
LSerni Avatar answered Oct 26 '22 07:10

LSerni


You can simplify your query to avoid the problem a priori:

SELECT p.cod, p.title
FROM   product p
WHERE  p.visible
AND    EXISTS (
    SELECT 1
    FROM   product_filter pf
    JOIN   filters f ON f.cod = pf.cod_filter
    WHERE  pf.cod_product = p.cod
    )
ORDER  BY random()
LIMIT  4;

Major points:

  • You have only columns from table product in the result, other tables are only checked for existence of a matching row. For a case like this the EXISTS semi-join is likely the fastest and simplest solution. Using it does not multiply rows from the base table product, so you don't need to remove them again with DISTINCT.

  • LIMIT has to come last, after ORDER BY.

  • I simplified WHERE p.visible = 't' to p.visible, because this should be a boolean column.

like image 22
Erwin Brandstetter Avatar answered Oct 26 '22 06:10

Erwin Brandstetter