I want to update table prod_replay_out
based on subquery results in Postgres. However, subquery returns multiple rows but I want to skip those rows and update table based on single rows return by subquery.
I have referred link Subquery returns more than 1 row error but max()
function will not apply for my expected results. Could you please provide me some suggestion to modify query? Thank you.
prod_replay_out
has following columns:
seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, in_msg_time, msg_type, cdsx_time
prod_replay_in
has following columns:
seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, msg_type, cdsx_time
What I have tried?
Please find below update sql:
Update sql:
update prod_replay_out O
set in_msg_id =
(Select id
From prod_replay_in I
Where I.msg_type = 'CDST010'
and I.seller = O.seller
and I.buyer = O.buyer
and I.sender_tag = O.sender_tag
and I.seller_tag = O.seller_tag
and I.buyer_tag = O.buyer_tag
and I.isin = O.isin
and I.quantity = O.quantity
and I.cdsx_time = O.in_msg_time
and I.cdsx_time::text like '2020-05-12%'
)
where O.msg_type = 'CDST01C'
and O.cdsx_time::text like '2020-05-12%';
I have tried below solution. Is it the correct approach or is there any loophole?
update prod_replay_out O
set in_msg_id =
(Select id
From prod_replay_in I
Where I.msg_type = 'CDST010'
and I.seller = O.seller
and I.buyer = O.buyer
and I.sender_tag = O.sender_tag
and I.seller_tag = O.seller_tag
and I.buyer_tag = O.buyer_tag
and I.isin = O.isin
and I.quantity = O.quantity
and I.cdsx_time = O.in_msg_time
and I.cdsx_time::text like '2020-05-12%'
and 1 = (Select count(id)
From prod_replay_in I
Where I.msg_type = 'CDST010'
and I.seller = O.seller
and I.buyer = O.buyer
and I.sender_tag = O.sender_tag
and I.seller_tag = O.seller_tag
and I.buyer_tag = O.buyer_tag
and I.isin = O.isin
and I.quantity = O.quantity
and I.cdsx_time = O.in_msg_time
and I.cdsx_time::text like '2020-05-12%'
)
)
where O.msg_type = 'CDST01C'
and O.cdsx_time::text like '2020-05-12%';
Multiple-row subqueries are nested queries that can return more than one row of results to the parent query. Multiple-row subqueries are used most commonly in WHERE and HAVING clauses. Since it returns multiple rows, it must be handled by set comparison operators (IN, ALL, ANY).
Technically, to repair your statement, you can add LIMIT 1 to the subquery to ensure that at most 1 row is returned. That would remove the error, your code would still be nonsense.
There are essentially two options to take. The first is to rewrite the query so that the subquery will only return a single row, thus eliminating the source of the error. This will require a user to rethink how they wish to acquire the data that they were initially searching for.
EXISTS. The argument of EXISTS is an arbitrary SELECT statement, or subquery. The subquery is evaluated to determine whether it returns any rows. If it returns at least one row, the result of EXISTS is “true”; if the subquery returns no rows, the result of EXISTS is “false”.
Most importantly, don't use a correlated subquery. It's the inferior tool for the job. Use a subquery in the FROM
clause.
This only updates where a single matching candidate row is found in the source table (neither none nor multiple), and only where it actually changes the value:
UPDATE prod_replay_out o
SET in_msg_id = i.id
FROM (
SELECT i.id, i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time
FROM prod_replay_in i
WHERE i.msg_type = 'CDST010'
AND i.cdsx_time >= '2020-05-12' -- ① "sargable" expression
AND i.cdsx_time < '2020-05-13' -- ② don't cast to date, it's a valid timestamp literal
AND NOT EXISTS ( -- ③ EXISTS is typically faster than counting
SELECT FROM prod_replay_in x
WHERE x.id <> i.id -- ④ unique
AND (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time) -- ⑤ short syntax
= (x.seller, x.buyer, x.sender_tag, x.seller_tag, x.buyer_tag, x.isin, x.quantity, x.cdsx_time)
)
) i
WHERE o.msg_type = 'CDST01C'
AND (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time)
= (o.seller, o.buyer, o.sender_tag, o.seller_tag, o.buyer_tag, o.isin, o.quantity, o.in_msg_time) -- ⑥ o.cdsx_time?
-- AND o.cdsx_time >= '2020-05-12' -- ⑦ redundant
-- AND o.cdsx_time < '2020-05-13'
AND o.in_msg_id IS DISTINCT FROM i.id -- ⑧ avoid empty updates
;
① Like GMB already suggested, transform this predicate to "sargable" expressions. This is faster, generally, and can use index support.
② But don't cast to date
if cdsx_time
is a timestamp
column (as seems likely). '2020-05-12'
is a perfectly valid timestamp literal, signifying the first instance of the day. See:
If it's a timestamptz
column, consider the possible influence of the timezone
setting! See:
③ EXISTS
is typically (much) more efficient than counting all rows, as it can stop as soon as another row is found. Especially if there can be many peers, and index support is available. See:
④ Assuming id
is unique (or PK). Else use the system column ctid
for the job. See:
⑤ Convenient, equivalent short syntax with ROW values. See:
⑥ Your query has:
and I.cdsx_time = O.in_msg_time -- !?
and I.cdsx_time::text like '2020-05-12%'
... but:
O.cdsx_time::text like '2020-05-12%'
You didn't mean to write and I.cdsx_time = O.cdsx_time
?
⑦ Would be noise. The restriction is already enforced in the subquery. (Doesn't help index support, either.)
⑧ This one is important if some columns may already have the desired value. Then the operation is skipped instead of writing an identical row version at full cost.
If both columns are defined NOT NULL
, simplify to o.in_msg_id <> i.id
. Again, see:
If performance is an issue or you run this repeatedly, consider indices like the following:
For the first (in order of the expected query plan!) step of identifying source row candidates:
CREATE INDEX foo ON prod_replay_in (msg_type, cdsx_time);
For the second step of ruling out duplicates:
CREATE INDEX foo ON prod_replay_in (seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, cdsx_time);
Or any small subset that is selective enough. A smaller index on fewer columns is typically more efficient if it includes relatively few additional rows as "false positives" in the index scan. While relatively few, these are eliminated cheaply in the following FILTER
step.
For the final step of identifying target rows:
CREATE INDEX foo ON prod_replay_out (msg_type, in_msg_time);
Again: or any small subset that is selective enough.
You want to update only when the subquery returns one row. One option uses aggreation and having
in the subquery:
update prod_replay_out o
set in_msg_id = (
select max(id)
from prod_replay_in i
where
i.msg_type = 'cdst010'
and i.seller = o.seller
and i.buyer = o.buyer
and i.sender_tag = o.sender_tag
and i.seller_tag = o.seller_tag
and i.buyer_tag = o.buyer_tag
and i.isin = o.isin
and i.quantity = o.quantity
and i.cdsx_time = o.in_msg_time
and i.cdsx_time >= '2020-05-12'::date
and i.cdsx_time < '2020-05-13'::date
having count(*) = 1
)
where
o.msg_type = 'cdst01c'
and o.cdsx_time >= '2020-05-12'::date
and o.cdsx_time < '2020-05-13'::date
Note that I rewrote the date filters to avoid the conversion to text (you can use an half-open interval with date literals instead, which is by far more efficient).
Note that this updates in_msg_id
to null
when the subquery would have returned multiple rows (or no rows at all). If you want to avoid that, you can filter in the where
clause:
update prod_replay_out o
set in_msg_id = (
select max(id)
from prod_replay_in i
where
i.msg_type = 'cdst010'
and i.seller = o.seller
and i.buyer = o.buyer
and i.sender_tag = o.sender_tag
and i.seller_tag = o.seller_tag
and i.buyer_tag = o.buyer_tag
and i.isin = o.isin
and i.quantity = o.quantity
and i.cdsx_time = o.in_msg_time
and i.cdsx_time >= '2020-05-12'::date
and i.cdsx_time < '2020-05-13'::date
having count(*) = 1
)
where
o.msg_type = 'cdst01c'
and o.cdsx_time >= '2020-05-12'::date
and o.cdsx_time < '2020-05-13'::date
and (
select count(*)
from prod_replay_in i
where
i.msg_type = 'cdst010'
and i.seller = o.seller
and i.buyer = o.buyer
and i.sender_tag = o.sender_tag
and i.seller_tag = o.seller_tag
and i.buyer_tag = o.buyer_tag
and i.isin = o.isin
and i.quantity = o.quantity
and i.cdsx_time = o.in_msg_time
and i.cdsx_time >= '2020-05-12'::date
and i.cdsx_time < '2020-05-13'::date
) = 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With