How do databases work internally? [closed]

Tags:

I've been working with databases for the last few years and I'd like to think that I've gotten fairly competent with using them. However I was reading recently about Joel's Law of Leaky Abstractions and I realised that even though I can write a query to get pretty much anything I want out of a database, I have no idea how the database actually interprets the query. Does anyone know of any good articles or books that explain how databases work internally?

Some specific things I'm interested in are:

What does a database actually do to find out what matches a select statement?
How does a database interpret a join differently to a query with several "where key1 = key2" statements?
How does the database store all its memory?
How are indexes stored?

218

asked Oct 06 '08 00:10

Bonnici

2 Answers

What does a database actually do to find out what matches a select statement?

To be blunt, it's a matter of brute force. Simply, it reads through each candidate record in the database and matches the expression to the fields. So, if you have "select * from table where name = 'fred'", it literally runs through each record, grabs the "name" field, and compares it to 'fred'.

Now, if the "table.name" field is indexed, then the database will (likely, but not necessarily) use the index first to locate the candidate records to apply the actual filter to.

This reduces the number of candidate records to apply the expression to, otherwise it will just do what we call a "table scan", i.e. read every row.

But fundamentally, however it locates the candidate records is separate from how it applies the actual filter expression, and, obviously, there are some clever optimizations that can be done.

How does a database interpret a join differently to a query with several "where key1 = key2" statements?

Well, a join is used to make a new "pseudo table", upon which the filter is applied. So, you have the filter criteria and the join criteria. The join criteria is used to build this "pseudo table" and then the filter is applied against that. Now, when interpreting the join, it's again the same issue as the filter -- brute force comparisons and index reads to build the subset for the "pseudo table".

How does the database store all its memory?

One of the keys to good database is how it manages its I/O buffers. But it basically matches RAM blocks to disk blocks. With the modern virtual memory managers, a simpler database can almost rely on the VM as its memory buffer manager. The high end DB'S do all this themselves.

How are indexes stored?

B+Trees typically, you should look it up. It's a straight forward technique that has been around for years. It's benefit is shared with most any balanced tree: consistent access to the nodes, plus all the leaf nodes are linked so you can easily traverse from node to node in key order. So, with an index, the rows can be considered "sorted" for specific fields in the database, and the database can leverage that information to it benefit for optimizations. This is distinct from, say, using a hash table for an index, which only lets you get to a specific record quickly. In a B-Tree you can quickly get not just to a specific record, but to a point within a sorted list.

The actual mechanics of storing and indexing rows in the database are really pretty straight forward and well understood. The game is managing buffers, and converting SQL in to efficient query paths to leverage these basic storage idioms.

Then, there's the whole multi-users, locking, logging, and transactions complexity on top of the storage idiom.

answered Oct 21 '22 00:10

Will Hartung

What does a database actually do to find out what matches a select statement?

DBs are using indexes(see below)
How does a database interpret a join differently to a query with several "where key1 = key2" statements? Join Operations can be translated to binary tree operations by merging trees.
How does the database store all its memory?

memorymapped files for faster access of their data
How are indexes stored?

Internally DBs are working with B-Trees for indexing.

This should be explained in greater details on wikipedia..

http://en.wikipedia.org/wiki/B-tree

http://en.wikipedia.org/wiki/Database

answered Oct 21 '22 00:10

Peter Parker

Related questions
                            
                                Export a CREATE script for a database
                            
                                Pros/cons of document-based databases vs. relational databases
                            
                                What is the default Precision and Scale for a Number in Oracle?
                            
                                What is it exactly a BLOB in a DBMS context
                            
                                Of Countries and their Cities [closed]
                            
                                Example of a strong and weak entity types
                            
                                Cannot attach the file *.mdf as database
                            
                                Best practices for using and persisting enums
                            
                                Inventory database design [closed]
                            
                                How can I merge many SQLite databases?
                            
                                What's your opinion on using UUIDs as database row identifiers, particularly in web apps?
                            
                                How to store JSON object in SQLite database
                            
                                Export table from database to csv file
                            
                                How to dispose TransactionScope in cancelable async/await?
                            
                                PGError: ERROR: permission denied for relation (when using Heroku)
                            
                                Indexing boolean fields
                            
                                When to use JCR (content repository) over other options?
                            
                                PDO::PARAM for type decimal?
                            
                                Django's ManyToMany Relationship with Additional Fields
                            
                                Insert 2 million rows into SQL Server quickly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do databases work internally? [closed]

Tags:

database

reference

internals

Bonnici

People also ask

2 Answers

Will Hartung

Peter Parker

Recent Activity

Donate For Us