Consider I have three tables (bet,win,cancel) that will contain the data for rounds of bets. Each round can have 1 or more bets and 1 or more wins. A round may be cancelled before any wins are recorded.
I would like to aggregate the data so that each round is displayed as one row, however my queries are very slow after testing with millions of dummy records each day.
Below are example data
Bet
bookmaker_id provider_id round_id transaction_id game_id user_id bet_amount balance bet_timestamp
1 1 Round1 bet_tx1 game1 123 10.00 90.00 2022-03-17 01:40:57.400
1 1 Round2 bet_tx2 game1 123 10.00 70.00 2022-03-17 02:40:57.400
1 1 Round2 bet_tx3 game1 123 10.00 80.00 2022-03-17 03:40:57.400
1 1 Round3 bet_tx4 game1 123 10.00 70.00 2022-03-17 04:40:57.400
Win
bookmaker_id provider_id round_id transaction_id user_id win_amount balance win_timestamp
1 1 Round1 win_tx1 123 0.00 80.00 2022-03-17 01:40:57.400
1 1 Round2 win_tx2 123 10.00 80.00 2022-03-17 02:40:57.400
Cancel
bookmaker_id provider_id round_id transaction_id user_id cancel_timestamp
1 1 Round3 can_tx1 123 2022-03-17 01:40:57.400
Each table has two indexes:
Primary Key: (bookmaker_id , provider_id , transaction_id)
Index: (bookmaker_id, provider_id, round_id)
The following query attempts to aggregate the round information for the first bet transaction. It totals the amount bet, totals the amount won, links the cancellation record if it exists and appends the transaction_id's together if there are multiple. It also returns the opening balance of the first bet and the closing balance of the last win.
Attempt at query
SELECT
bet.*,
win.*,
cancel.transaction_id as cancel_transaction_id,
cancel.cancel_timestamp
FROM
(
SELECT
round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
STRING_AGG(bet.transaction_id, ', ') as bet_transaction_ids,
SUM(bet_amount) as total_bet,
MIN(bet_timestamp) as bet_timestamp,
opening_balance
FROM
(
SELECT
FIRST_VALUE(balance) OVER (
PARTITION BY round_id
order by
bet_timestamp asc
) as opening_balance,
*
FROM
bet
WHERE
bookmaker_id = 1
AND provider_id = 1
AND transaction_id = 'bet_tx1'
) AS bet
GROUP BY
round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
opening_balance
) AS bet
LEFT JOIN (
SELECT
round_id,
STRING_AGG(win.transaction_id, ', ') as win_transaction_ids,
SUM(win_amount) as total_won,
MAX(win_timestamp) as win_timestamp,
closing_balance
FROM
(
SELECT
FIRST_VALUE(balance) OVER (
PARTITION BY round_id
ORDER BY
win_timestamp desc
) AS closing_balance,
*
FROM
win
) AS win
GROUP BY
win.round_id,
closing_balance
) AS win ON bet.round_id = win.round_id
LEFT JOIN (
SELECT
[round_id],
[transaction_id],
[cancel_timestamp]
FROM
cancel
) AS cancel ON bet.round_id = cancel.round_id
ORDER BY
total_won;
Result
round_id game_id provider_id bookmaker_id user_id bet_transaction_ids total_bet bet_timestamp opening_balance round_id win_transaction_ids total_won win_timestamp closing_balance cancel_transaction_id cancel_timestamp
Round1 game1 1 1 123 bet_tx1 10.00 2022-03-17 01:40:57.400 90.00 Round1 win_tx1 0.00 2022-03-17 01:40:57.400 80.00 null null
I assume this is not the most efficient way to link these tables if all I have is the bet transaction id considering the indexes and millions of potential records. I would also like to select multiple rounds filtered by bet_timestamp but I hope I can apply anything learned here to that query too.
How should I query these three tables more efficiently?
Any assistance would be most appreciated.
DB Fiddle Here
I do not know if it works, but could you try this one:
First add an index to your bet table like this(keep the current one too):
Index: (bet_timestamp, round_id, bookmaker_id, provider_id)
Also you have primary key like below:
Primary Key: (bookmaker_id , provider_id , transaction_id)
I do not know histogram of your data in columns "bookmaker_id , provider_id" and in this example you are using "transaction_id" as one of key column of the primary key, but it does not provide any relationship between other tables. After checking historgram of the data refreshing architecture could be good.
Please try this(before executing please create index):
WITH openingtab
AS (SELECT round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
String_agg(bet.transaction_id, ', ') AS bet_transaction_ids,
Sum(bet_amount) AS total_bet,
Min(bet_timestamp) openingbet_time
FROM bet
WHERE bookmaker_id = 1
AND provider_id = 1
GROUP BY round_id,
game_id,
provider_id,
bookmaker_id,
user_id),
bettab
AS (SELECT b.round_id,
b.user_id,
b.game_id,
b.balance opening_bet,
b.provider_id,
b. bookmaker_id,
o.bet_transaction_ids,
o.total_bet,
o.openingbet_time
FROM openingtab o
JOIN bet b
ON o.openingbet_time = b.bet_timestamp
AND o.round_id = b.round_id
AND o.provider_id = b.provider_id
AND o.bookmaker_id = b.bookmaker_id),
betwin
AS (SELECT b.*,
w.transaction_id wtid,
w.win_amount,
w.balance,
w.win_timestamp,
c.transaction_id ctid,
c.cancel_timestamp
FROM bettab b
LEFT OUTER JOIN win w
ON b.round_id = w.round_id
AND b.provider_id = w.provider_id
AND b.bookmaker_id = w.bookmaker_id
LEFT OUTER JOIN cancel c
ON b.round_id = c.round_id
AND b.provider_id = c.provider_id
AND b.bookmaker_id = c.bookmaker_id)
SELECT *
FROM betwin;
First, the subquery of bet
might rewrite as below query, because there is a Primary Key: (bookmaker_id , provider_id , transaction_id)
which all hit your condition, so that might be no sense to use aggregate & window function because Primary Key will not be duplicated.
SELECT
round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
transaction_id as bet_transaction_ids,
bet_amount as total_bet,
bet_timestamp as bet_timestamp,
balance as opening_balance
FROM bet
WHERE
bookmaker_id = 1
AND provider_id = 1
AND transaction_id = 'bet_tx1'
The second thing, try to use indexes & filter your result set as possible.
In the below query, I had added conditions bookmaker_id
& provider_id
on the cancel
table when doing OUTER JOIN
which might hit your indexes as possible.
because the first column of index is very important that will let QO determine whether use indexes, so try to use first column as well
SELECT
bet.*,
win.*,
cancel.transaction_id as cancel_transaction_id,
cancel.cancel_timestamp
FROM
(
SELECT
round_id,
game_id,
provider_id,
bookmaker_id,
user_id,
transaction_id as bet_transaction_ids,
bet_amount as total_bet,
bet_timestamp as bet_timestamp,
balance as opening_balance
FROM bet
WHERE
bookmaker_id = 1
AND provider_id = 1
AND transaction_id = 'bet_tx1'
) AS bet
LEFT JOIN (
SELECT
round_id,
STRING_AGG(win.transaction_id, ', ') as win_transaction_ids,
SUM(win_amount) as total_won,
MAX(win_timestamp) as win_timestamp,
closing_balance
FROM
(
SELECT
FIRST_VALUE(balance) OVER (
PARTITION BY round_id
ORDER BY
win_timestamp desc
) AS closing_balance,
*
FROM
win
) AS win
GROUP BY
win.round_id,
closing_balance
) AS win ON bet.round_id = win.round_id
LEFT JOIN cancel ON bet.round_id = cancel.round_id
-- add filter condition
AND bet.bookmaker_id = cancel.bookmaker_id
AND bet.provider_id = cancel.provider_id
ORDER BY
total_won;
Final need to review all of the columns whether you indeed need to try to avoid select *
(unless you are using cluster index and really need all of them)
The rewritten query of the execution plan might do cluster seek on bet
& cancel
table and reduce more cost the original one.
sqlfiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With