I need to retrieve all rows from a table where 2 columns combined are all different. So I want all the sales that do not have any other sales that happened on the same day for the same price. The sales that are unique based on day and price will get updated to an active status.
So I'm thinking:
UPDATE sales SET status = 'ACTIVE' WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id) FROM sales HAVING count = 1)
But my brain hurts going any farther than that.
Answer. Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note that DISTINCT will filter out all rows that are not unique in terms of all selected columns.
Yes, DISTINCT works on all combinations of column values for all columns in the SELECT clause.
To get unique or distinct values of a column in MySQL Table, use the following SQL Query. SELECT DISTINCT(column_name) FROM your_table_name; You can select distinct values for one or more columns. The column names has to be separated with comma.
SELECT DISTINCT a,b,c FROM t
is roughly equivalent to:
SELECT a,b,c FROM t GROUP BY a,b,c
It's a good idea to get used to the GROUP BY syntax, as it's more powerful.
For your query, I'd do it like this:
UPDATE sales SET status='ACTIVE' WHERE id IN ( SELECT id FROM sales S INNER JOIN ( SELECT saleprice, saledate FROM sales GROUP BY saleprice, saledate HAVING COUNT(*) = 1 ) T ON S.saleprice=T.saleprice AND s.saledate=T.saledate )
If you put together the answers so far, clean up and improve, you would arrive at this superior query:
UPDATE sales SET status = 'ACTIVE' WHERE (saleprice, saledate) IN ( SELECT saleprice, saledate FROM sales GROUP BY saleprice, saledate HAVING count(*) = 1 );
Which is much faster than either of them. Nukes the performance of the currently accepted answer by factor 10 - 15 (in my tests on PostgreSQL 8.4 and 9.1).
But this is still far from optimal. Use a NOT EXISTS
(anti-)semi-join for even better performance. EXISTS
is standard SQL, has been around forever (at least since PostgreSQL 7.2, long before this question was asked) and fits the presented requirements perfectly:
UPDATE sales s SET status = 'ACTIVE' WHERE NOT EXISTS ( SELECT FROM sales s1 -- SELECT list can be empty for EXISTS WHERE s.saleprice = s1.saleprice AND s.saledate = s1.saledate AND s.id <> s1.id -- except for row itself ) AND s.status IS DISTINCT FROM 'ACTIVE'; -- avoid empty updates. see below
db<>fiddle here
Old SQL Fiddle
If you don't have a primary or unique key for the table (id
in the example), you can substitute with the system column ctid
for the purpose of this query (but not for some other purposes):
AND s1.ctid <> s.ctid
Every table should have a primary key. Add one if you didn't have one, yet. I suggest a serial
or an IDENTITY
column in Postgres 10+.
Related:
The subquery in the EXISTS
anti-semi-join can stop evaluating as soon as the first dupe is found (no point in looking further). For a base table with few duplicates this is only mildly more efficient. With lots of duplicates this becomes way more efficient.
For rows that already have status = 'ACTIVE'
this update would not change anything, but still insert a new row version at full cost (minor exceptions apply). Normally, you do not want this. Add another WHERE
condition like demonstrated above to avoid this and make it even faster:
If status
is defined NOT NULL
, you can simplify to:
AND status <> 'ACTIVE';
The data type of the column must support the <>
operator. Some types like json
don't. See:
This query (unlike the currently accepted answer by Joel) does not treat NULL values as equal. The following two rows for (saleprice, saledate)
would qualify as "distinct" (though looking identical to the human eye):
(123, NULL) (123, NULL)
Also passes in a unique index and almost anywhere else, since NULL values do not compare equal according to the SQL standard. See:
OTOH, GROUP BY
, DISTINCT
or DISTINCT ON ()
treat NULL values as equal. Use an appropriate query style depending on what you want to achieve. You can still use this faster query with IS NOT DISTINCT FROM
instead of =
for any or all comparisons to make NULL compare equal. More:
If all columns being compared are defined NOT NULL
, there is no room for disagreement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With