Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate multiple tables considering indexing with values derived from one table

Consider I have three tables (bet,win,cancel) that will contain the data for rounds of bets. Each round can have 1 or more bets and 1 or more wins. A round may be cancelled before any wins are recorded.

I would like to aggregate the data so that each round is displayed as one row, however my queries are very slow after testing with millions of dummy records each day.

Below are example data

Bet

bookmaker_id    provider_id round_id    transaction_id  game_id user_id bet_amount  balance bet_timestamp
1               1           Round1      bet_tx1         game1   123     10.00       90.00   2022-03-17 01:40:57.400
1               1           Round2      bet_tx2         game1   123     10.00       70.00   2022-03-17 02:40:57.400
1               1           Round2      bet_tx3         game1   123     10.00       80.00   2022-03-17 03:40:57.400
1               1           Round3      bet_tx4         game1   123     10.00       70.00   2022-03-17 04:40:57.400

Win

bookmaker_id    provider_id round_id    transaction_id  user_id win_amount  balance win_timestamp
1               1           Round1      win_tx1         123     0.00        80.00   2022-03-17 01:40:57.400
1               1           Round2      win_tx2         123     10.00       80.00   2022-03-17 02:40:57.400

Cancel

bookmaker_id    provider_id round_id    transaction_id  user_id cancel_timestamp
1               1           Round3      can_tx1         123     2022-03-17 01:40:57.400

Each table has two indexes:
Primary Key: (bookmaker_id , provider_id , transaction_id)
Index: (bookmaker_id, provider_id, round_id)

The following query attempts to aggregate the round information for the first bet transaction. It totals the amount bet, totals the amount won, links the cancellation record if it exists and appends the transaction_id's together if there are multiple. It also returns the opening balance of the first bet and the closing balance of the last win.

Attempt at query

SELECT 
  bet.*, 
  win.*, 
  cancel.transaction_id as cancel_transaction_id, 
  cancel.cancel_timestamp 
FROM 
  (
    SELECT 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      STRING_AGG(bet.transaction_id, ', ') as bet_transaction_ids, 
      SUM(bet_amount) as total_bet, 
      MIN(bet_timestamp) as bet_timestamp, 
      opening_balance 
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            order by 
              bet_timestamp asc
          ) as opening_balance, 
          * 
        FROM 
          bet 
        WHERE 
          bookmaker_id = 1 
          AND provider_id = 1 
          AND transaction_id = 'bet_tx1'
      ) AS bet 
    GROUP BY 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      opening_balance
  ) AS bet 
  LEFT JOIN (
    SELECT 
      round_id, 
      STRING_AGG(win.transaction_id, ', ') as win_transaction_ids, 
      SUM(win_amount) as total_won, 
      MAX(win_timestamp) as win_timestamp, 
      closing_balance 
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            ORDER BY 
              win_timestamp desc
          ) AS closing_balance, 
          * 
        FROM 
          win
      ) AS win 
    GROUP BY 
      win.round_id, 
      closing_balance
  ) AS win ON bet.round_id = win.round_id 
  LEFT JOIN (
    SELECT 
      [round_id], 
      [transaction_id], 
      [cancel_timestamp] 
    FROM 
      cancel
  ) AS cancel ON bet.round_id = cancel.round_id 
ORDER BY 
  total_won;

Result

round_id    game_id provider_id bookmaker_id    user_id bet_transaction_ids total_bet   bet_timestamp             opening_balance   round_id    win_transaction_ids total_won   win_timestamp             closing_balance   cancel_transaction_id   cancel_timestamp
Round1      game1   1           1               123     bet_tx1             10.00       2022-03-17 01:40:57.400   90.00             Round1      win_tx1             0.00        2022-03-17 01:40:57.400   80.00             null                    null

I assume this is not the most efficient way to link these tables if all I have is the bet transaction id considering the indexes and millions of potential records. I would also like to select multiple rounds filtered by bet_timestamp but I hope I can apply anything learned here to that query too.

How should I query these three tables more efficiently?

Any assistance would be most appreciated.

DB Fiddle Here

like image 382
Fuzz Avatar asked Sep 17 '25 02:09

Fuzz


2 Answers

I do not know if it works, but could you try this one:

First add an index to your bet table like this(keep the current one too):

Index: (bet_timestamp, round_id, bookmaker_id, provider_id)

Also you have primary key like below:

Primary Key: (bookmaker_id , provider_id , transaction_id)

I do not know histogram of your data in columns "bookmaker_id , provider_id" and in this example you are using "transaction_id" as one of key column of the primary key, but it does not provide any relationship between other tables. After checking historgram of the data refreshing architecture could be good.

Please try this(before executing please create index):

WITH openingtab
     AS (SELECT round_id,
                game_id,
                provider_id,
                bookmaker_id,
                user_id,
                String_agg(bet.transaction_id, ', ') AS bet_transaction_ids,
                Sum(bet_amount)                      AS total_bet,
                Min(bet_timestamp)                   openingbet_time
         FROM   bet
         WHERE  bookmaker_id = 1
                AND provider_id = 1
         GROUP  BY round_id,
                   game_id,
                   provider_id,
                   bookmaker_id,
                   user_id),
     bettab
     AS (SELECT b.round_id,
                b.user_id,
                b.game_id,
                b.balance opening_bet,
                b.provider_id,
                b. bookmaker_id,
                o.bet_transaction_ids,
                o.total_bet,
                o.openingbet_time
         FROM   openingtab o
                JOIN bet b
                  ON o.openingbet_time = b.bet_timestamp
                     AND o.round_id = b.round_id
                     AND o.provider_id = b.provider_id
                     AND o.bookmaker_id = b.bookmaker_id),
     betwin
     AS (SELECT b.*,
                w.transaction_id wtid,
                w.win_amount,
                w.balance,
                w.win_timestamp,
                c.transaction_id ctid,
                c.cancel_timestamp
         FROM   bettab b
                LEFT OUTER JOIN win w
                             ON b.round_id = w.round_id
                                AND b.provider_id = w.provider_id
                                AND b.bookmaker_id = w.bookmaker_id
                LEFT OUTER JOIN cancel c
                             ON b.round_id = c.round_id
                                AND b.provider_id = c.provider_id
                                AND b.bookmaker_id = c.bookmaker_id)
SELECT *
FROM   betwin; 
like image 111
Umut TEKİN Avatar answered Sep 18 '25 17:09

Umut TEKİN


First, the subquery of bet might rewrite as below query, because there is a Primary Key: (bookmaker_id , provider_id , transaction_id) which all hit your condition, so that might be no sense to use aggregate & window function because Primary Key will not be duplicated.

SELECT 
  round_id, 
  game_id, 
  provider_id, 
  bookmaker_id, 
  user_id, 
  transaction_id as bet_transaction_ids, 
  bet_amount as total_bet, 
  bet_timestamp as bet_timestamp, 
  balance as opening_balance 
FROM bet
WHERE 
    bookmaker_id = 1 
    AND provider_id = 1 
    AND transaction_id = 'bet_tx1'

The second thing, try to use indexes & filter your result set as possible.

In the below query, I had added conditions bookmaker_id & provider_id on the cancel table when doing OUTER JOIN which might hit your indexes as possible.

because the first column of index is very important that will let QO determine whether use indexes, so try to use first column as well

SELECT 
  bet.*, 
  win.*, 
  cancel.transaction_id as cancel_transaction_id, 
  cancel.cancel_timestamp 
FROM 
  (
    SELECT 
      round_id, 
      game_id, 
      provider_id, 
      bookmaker_id, 
      user_id, 
      transaction_id as bet_transaction_ids, 
      bet_amount as total_bet, 
      bet_timestamp as bet_timestamp, 
      balance as opening_balance 
    FROM bet
    WHERE 
        bookmaker_id = 1 
        AND provider_id = 1 
        AND transaction_id = 'bet_tx1'
  ) AS bet 
  LEFT JOIN (
    SELECT 
      round_id, 
      STRING_AGG(win.transaction_id, ', ') as win_transaction_ids, 
      SUM(win_amount) as total_won, 
      MAX(win_timestamp) as win_timestamp, 
      closing_balance
    FROM 
      (
        SELECT 
          FIRST_VALUE(balance) OVER (
            PARTITION BY round_id 
            ORDER BY 
              win_timestamp desc
          ) AS closing_balance, 
          * 
        FROM 
          win
      ) AS win  
    GROUP BY 
      win.round_id, 
      closing_balance
  ) AS win ON bet.round_id = win.round_id 
  LEFT JOIN cancel ON bet.round_id = cancel.round_id 
  -- add filter condition
  AND bet.bookmaker_id = cancel.bookmaker_id
  AND bet.provider_id = cancel.provider_id
ORDER BY 
  total_won;

Final need to review all of the columns whether you indeed need to try to avoid select * (unless you are using cluster index and really need all of them)

The rewritten query of the execution plan might do cluster seek on bet & cancel table and reduce more cost the original one.

sqlfiddle

like image 45
D-Shih Avatar answered Sep 18 '25 16:09

D-Shih