What is the blocking factor in a DBMS,
The bit I looked at said it was the floored value of blocks per record (so B/R floor), where B is block size and R is records. I was just wondering, can someone tell me the main reason its used, and also whether it is actually FLOORED?
My understanding of FLOORED is 1.5 gets floored to 1.0, for anyone that is wondering.
What is a Blocking Factor? A blocking factor is a factor used to create blocks. It is some variable that has an effect on an experimental outcome, but is itself of no interest. Blocking factors vary wildly depending on the experiment. For example: in human studies age or gender are often used as blocking factors.
A variable's blocking factor specifies the minimum number of records the library allocates when you write to an unallocated record.
Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter. An example of a blocking factor might be the sex of a patient; by blocking on sex, this source of variability is controlled for, thus leading to greater accuracy.
Yes, it means how many whole records fit into a block.
(A block is the smallest unit of data that the underlying storage system (hdd, san fs, etc) is willing to deal in. It's size is traditionally 512 bytes for hard drives.)
It is floored because if 100 and a half record would fit, one only stores 100 records per block.
Blocking factor is pretty heavily used in many dbms related calculations.
For example:
We have 10 000 000 records. Each record is 80 bytes long. Each record contains an unique key (Lets say social security numbers). We want looking up someone by their social security number to be fast.
We need something to measure performance by. The thing that takes the most time is asking a block from the harddisk. You know, it is a mechanical device. It has to reposition its head, and blabla, so it really a slow operation when compared to how fast the CPU is, or compared to how fast operative memory(RAM) access is. Okay, lets say that we measure the performance of an operation by how many disk accesses it takes. We want to minimize the number of disk accesses. Okay, now we know how to tell if something is slow or fast.
Many disk accesses -> bad
Very few disk accesses -> good
Lets say that on our imaginary hw, each block is 5000 byte. We want calculate how many blocks we need. First, we need to know how many records fit into a single block:
Blocking factor
= floored((Block size)/(Record size))
= floored(5000/80)
= floored(62.5)
= 62 record/block
And we have 10000000 records, so we need ceiled(10000000/62)=ceiled(161290.32)=161291
blocks to store all that data.
If one were to read all the blocks to find a single record by the key (social security number), then that would take 161291 disk accesses. Not good.
We can do better. Lets build an index file. We will build a sparse index.
A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.
Okay, so we will have a pointer and a key in our index file for each block. Lets say that on our imaginary hw, a pointer is 4 bytes long, and in our imaginary world a social security number (key) takes up 6 bytes.
So we are going to store one 10 byte long key-pointer pair for each block in our index. How many of these pairs fit into a single block?
Blocking factor of the index file = floored(5000/10) = 500
... so this means that 500 key-pointer pairs fit into a single block. And we need to store 161291 of these, so the index file will take up ceiled(161291/500)=323
blocks
The index file is ordered by key, so we can do binary search in it to find the pointer to the block which contains the record. Doing binary search in the index file costs at most ceiled(log2(323))=9
disk acceses. We also need +1
disk access to actually read the data block which the index record points to.
Wow, we got our lookup to work in 10 disk accesses. That's pretty awesome. We could even do better. :)
Okay, so you can see that blocking factor is heavy used for example in this calculation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With