Blocking factor in a DBMS

Tags:

The bit I looked at said it was the floored value of blocks per record (so B/R floor), where B is block size and R is records. I was just wondering, can someone tell me the main reason its used, and also whether it is actually FLOORED?

My understanding of FLOORED is 1.5 gets floored to 1.0, for anyone that is wondering.

235

asked Apr 07 '13 05:04

Sim

1 Answers

Yes, it means how many whole records fit into a block.

(A block is the smallest unit of data that the underlying storage system (hdd, san fs, etc) is willing to deal in. It's size is traditionally 512 bytes for hard drives.)

It is floored because if 100 and a half record would fit, one only stores 100 records per block.

Blocking factor is pretty heavily used in many dbms related calculations.

For example:

The problem

We have 10 000 000 records. Each record is 80 bytes long. Each record contains an unique key (Lets say social security numbers). We want looking up someone by their social security number to be fast.

But what is fast?

We need something to measure performance by. The thing that takes the most time is asking a block from the harddisk. You know, it is a mechanical device. It has to reposition its head, and blabla, so it really a slow operation when compared to how fast the CPU is, or compared to how fast operative memory(RAM) access is. Okay, lets say that we measure the performance of an operation by how many disk accesses it takes. We want to minimize the number of disk accesses. Okay, now we know how to tell if something is slow or fast.

Many disk accesses -> bad

Very few disk accesses -> good

Calculating how many blocks our data needs

Lets say that on our imaginary hw, each block is 5000 byte. We want calculate how many blocks we need. First, we need to know how many records fit into a single block:

Blocking factor = floored((Block size)/(Record size)) = floored(5000/80) = floored(62.5) = 62 record/block

And we have 10000000 records, so we need ceiled(10000000/62)=ceiled(161290.32)=161291 blocks to store all that data.

Whoa, that's a lot of data. How do I look up someone fast?

If one were to read all the blocks to find a single record by the key (social security number), then that would take 161291 disk accesses. Not good.

We can do better. Lets build an index file. We will build a sparse index.

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.

Okay, so we will have a pointer and a key in our index file for each block. Lets say that on our imaginary hw, a pointer is 4 bytes long, and in our imaginary world a social security number (key) takes up 6 bytes.

So we are going to store one 10 byte long key-pointer pair for each block in our index. How many of these pairs fit into a single block?

Blocking factor of the index file = floored(5000/10) = 500

... so this means that 500 key-pointer pairs fit into a single block. And we need to store 161291 of these, so the index file will take up ceiled(161291/500)=323 blocks

The index file is ordered by key, so we can do binary search in it to find the pointer to the block which contains the record. Doing binary search in the index file costs at most ceiled(log2(323))=9 disk acceses. We also need +1 disk access to actually read the data block which the index record points to.

Wow, we got our lookup to work in 10 disk accesses. That's pretty awesome. We could even do better. :)

Okay, so you can see that blocking factor is heavy used for example in this calculation.

answered Sep 20 '22 01:09

Tarnay Kálmán

Related questions
                            
                                How can I populate a class from the results of a SQL query in C#?
                            
                                Where does SQL Server 2005 keep the .mdf files?
                            
                                How to index polymorphic table in rails with not unique "association_type" and "association_id"
                            
                                solution to OCR / search through 4 million pieces of paper and 10,000 added daily
                            
                                Mssql login fail ECONNREFUSED 127.0.0.1:1433
                            
                                "Where" statement : contains a certain substring
                            
                                DOs and DONTs for Indexes [closed]
                            
                                What exactly is a foreign key?
                            
                                Start MySQL Server as a service (Win 8)
                            
                                Django project looking for "attribute '_session_cache'"
                            
                                How should I store short text strings into a SQL Server database?
                            
                                Laravel DB::transaction not rolling back on exception
                            
                                How much does wrapping inserts in a transaction help performance on Sql Server?
                            
                                Does every table really need an auto-incrementing artificial primary key? [closed]
                            
                                How do I set the default database in Sql Server from code?
                            
                                Sorting NSSets of a core data entity - Objective-C
                            
                                Where is my sqlite database stored in android?
                            
                                How can i check to see if my sqlite table has data in it?
                            
                                Is it safe to install SQL Server 2008 R2 and MySQL side-by-side on Windows Server 2003 Enterprise Edition?
                            
                                Pros and cons of connecting more than one database in single script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Blocking factor in a DBMS

Tags:

database

indexing

Sim

People also ask