I am trying to calculate the cost of the (most efficient) block nested loop join in terms of NDPR (number of disk page reads). Suppose you have a query of the form: <pre class="prettyprint"><code>SELECT COUNT(*) FROM county JOIN mcd ON count.state_code = mcd.state_code AND county.fips_code = mcd.fips_code WHERE county.state_code = @NO </code></pre> where @NO is substituted for a state code on each execution of the query. I know that I can derive the NPDR using: <code>NPDR(R x S) = |Pages(R)| + Pages(R) / B - 2 . |P ages(S)|</code> (where the smaller table is used as the outer in order to produce less page reads. Ergo: R = county, S = mcd). I also know that Page size = 2048 bytes <pre class="prettyprint"><code>Pointer = 8 byte Num. rows in mcd table = 35298 Num. rows in county table = 3141 Free memory buffer pages B = 100 Pages(X) = (rowsize)(numrows) / pagesize </code></pre> What I am trying to figure out is how the "<code>WHERE county.state_code = @NO</code>" affects my cost? Thanks for your time.

First a couple of observations regarding the formula you wrote: <ul> <li>I'm not sure why it you write "B - 2" instead of "B - 1". From a theoretical perspective, you need a single buffer page to read in relation S (you can do it by reading one page at a time). </li> <li>Make sure you use all the brackets. I would write the formula as: <code>NPDR(R x S) = |Pages(R)| + |Pages(R)| / (B-2) * |Pages(S)|</code></li> <li>The all numbers in the formula would need to be rounded up (but this is nitpicking).</li> <li> The explanation for the generic BNLJ formula: <ul> <li>You read in as many tuples from the smaller relation (R) as you can keep in memory (B-1 or B-2 pages worth of tuples).</li> <li>For each group of B-2 pages worth of tuples, you then have to read the whole S relation ( |Pages(S)|) to perform the join for that particular range of relation R.</li> <li>At the end of the join, relation R is read exactly one time and relation S is read as many times as we filled the memory buffer, namely <code>|Pages(R)| / (B-2)</code> times.</li> </ul> </li> </ul> Now the answer: <ul> <li>In your example a selection criteria is applied to relation R (table Country in this case). This is the <code>WHERE county.state_code = @NO</code> part of the query. Therefore, the generic formula does not apply directly.</li> <li>When reading from relation R (i.e., table Country in your example), we can discard all the non-qualifying tuples that do not match the selection criteria. Assuming that there are 50 states in the USA and that all states have the same number of counties, only 2% of the tuples in table Country qualify on average and need to be stored in memory. This reduces the number of iteration of the inner loop of the join (i.e., the number of times we need to scan relation S / table mcs). The 2% number is obviously just the expected average and will change depending on the actual given state.</li> <li>The formula for your problem therefore becomes: <code>NPDR(R x S) = |Pages(County)| + |Pages(County)| / (B - 2) * |Counties in state @NO| / |Rows in table County| * |Pages(Mcd)|</code></li> </ul>

Calculating the cost of Block Nested Loop Joins

Tags:

sql

database

mysql

I am trying to calculate the cost of the (most efficient) block nested loop join in terms of NDPR (number of disk page reads). Suppose you have a query of the form:

SELECT COUNT(*)
FROM county JOIN mcd
ON count.state_code = mcd.state_code
AND county.fips_code = mcd.fips_code
WHERE county.state_code = @NO

where @NO is substituted for a state code on each execution of the query.

I know that I can derive the NPDR using: NPDR(R x S) = |Pages(R)| + Pages(R) / B - 2 . |P ages(S)|

(where the smaller table is used as the outer in order to produce less page reads. Ergo: R = county, S = mcd).

I also know that Page size = 2048 bytes

Pointer = 8 byte
Num. rows in mcd table = 35298
Num. rows in county table = 3141
Free memory buffer pages B = 100
Pages(X) = (rowsize)(numrows) / pagesize

What I am trying to figure out is how the "WHERE county.state_code = @NO" affects my cost?

Thanks for your time.

599

asked Nov 22 '12 21:11

JB2

1 Answers

First a couple of observations regarding the formula you wrote:

I'm not sure why it you write "B - 2" instead of "B - 1". From a theoretical perspective, you need a single buffer page to read in relation S (you can do it by reading one page at a time).
Make sure you use all the brackets. I would write the formula as:
NPDR(R x S) = |Pages(R)| + |Pages(R)| / (B-2) * |Pages(S)|
The all numbers in the formula would need to be rounded up (but this is nitpicking).
The explanation for the generic BNLJ formula:
- You read in as many tuples from the smaller relation (R) as you can keep in memory (B-1 or B-2 pages worth of tuples).
- For each group of B-2 pages worth of tuples, you then have to read the whole S relation ( |Pages(S)|) to perform the join for that particular range of relation R.
- At the end of the join, relation R is read exactly one time and relation S is read as many times as we filled the memory buffer, namely |Pages(R)| / (B-2) times.

Now the answer:

In your example a selection criteria is applied to relation R (table Country in this case). This is the WHERE county.state_code = @NO part of the query. Therefore, the generic formula does not apply directly.
When reading from relation R (i.e., table Country in your example), we can discard all the non-qualifying tuples that do not match the selection criteria. Assuming that there are 50 states in the USA and that all states have the same number of counties, only 2% of the tuples in table Country qualify on average and need to be stored in memory. This reduces the number of iteration of the inner loop of the join (i.e., the number of times we need to scan relation S / table mcs). The 2% number is obviously just the expected average and will change depending on the actual given state.
The formula for your problem therefore becomes:
NPDR(R x S) = |Pages(County)| + |Pages(County)| / (B - 2) * |Counties in state @NO| / |Rows in table County| * |Pages(Mcd)|

167

answered Sep 23 '22 03:09

Radu

Related questions
                            
                                How to load JSON data with jQuery, PHP and MySQL
                            
                                Insert file into mysql Blob
                            
                                Still confused by Java Timestamps etc with MySQL
                            
                                ALTER TABLE with PDO and parameters?
                            
                                Php mysql, pdo and query: why multiple 'query' don't work (in this case)?
                            
                                Simple PHP Voting System
                            
                                Selecting with UNION but limiting every subquery and receiving distinct values
                            
                                Avoid dead lock by ordering explicitly
                            
                                How do I suppress MySQL errors?
                            
                                Removing select N+1 without .Include
                            
                                MySQL delete row until certain point
                            
                                fetch_array() not preserving ORDER BY from query
                            
                                mysql prepared statement : Update query
                            
                                Connecting to Database Cube that uses MySQL database from PHP (using JDBC)
                            
                                Function in code I'm debugging seems to not take into account shifts to and from DST
                            
                                Table design advice
                            
                                How can I improve my LIKE with JOIN search in mysql?
                            
                                Google Cloud SQL: Unable to execute statement
                            
                                UPDATE / INSERT from time to time takes few seconds
                            
                                Query firebird slow order by / distinct

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With