Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tips for writing a DBMS

Tags:

c++

database

I have taken a graduate level course which is just one big project - to write a DBMS.

The objective is not to reinvent the wheel and make an enterprise DBMS to rival Oracle. Only a small subset of SQL commands need to be supported. Nor is the objective to create some fancy hybrid model DBMS for storing multimedia or something. It has to be a traditional RDBMS.

The main goal of the project is to use programing techniques to take advantage of modern architectures (multicore processors) to build a high performing database (speed, load).

I was just wondering if there were any resources on query evaluations, optimizers, data structures ideal for DBMSes or basically anything that could help me create a standout project. The professor was throwing around terms like metaprogramming for example.

The project must be done entirely in C++.


Thanks for the replies so far! I cannot optimize an existing DBMS such as MySQL as the project requires you to build your own DBMS from scratch. Yes I know this is pretty much reinventing the wheel for most part, but there is scope for some novel query evaluation and optimization algorithms. If you know any good resources or books dealing with this specific area, then please tell me!

like image 563
user245120 Avatar asked Jan 09 '10 01:01

user245120


4 Answers

First you need to learn about relational calculus and make a compiler to deal with making it from sql, thankfully sql is an easy language and this is not bad.

Then get familiar with bx-trees for your indexes. Then make a commit and rollback space and that is pretty much all there is to it. It's not rocket science, compared to other projects you might undertake, but it's definitely something you better start right away if you want a good result by the end of the semester/year.

edit: Oh, and as for modern architecture goes, trees don't usually benefit much from multithreading. Neither do disk reads. On the other hand, it's crucial for high performance to use the whole of your memory using OS level calls, not just the memory normally addressable in a process.

like image 76
Charles Eli Cheese Avatar answered Oct 21 '22 07:10

Charles Eli Cheese


As your looking to take advantage of modern CPU architectures, it might be worth looking at the MonetDB project. The project has produced a lot of research around optimising databases for modern CPU architecture, using column stores and storing compressed pages in memory - only decompressing them in the CPU cache to get significant speeds for very large databases.

This approach (column oriented storage + compression) and a more traditional query engine, perhaps based on the SQLite engine, should be a good basis for a project.

like image 34
Robert Christie Avatar answered Oct 21 '22 07:10

Robert Christie


Since your professor mentioned metaprogramming, you might want to look at the following:

  1. WAM - Warren Abstract Machine. This compiles prolog code into a set of instructions that can be executed on an abstract machine. The idea is similar to jvm and cli. You don't need to go into this in detail, just understand the idea of an abstract machine.

  2. JVM, CLI - same as above.

  3. Tools such as lex, yacc, flex, bison. Since you will be writing essentially an interpreter/compiler for SQL commands, you probably want to use some tools. This can be viewed as a form of metaprogramming, since you are using a language to write a tool - so you are programming at the meta-level.

  4. Again, the idea of meta-programming - perhaps you can augment your language with constructs that will allow your SQL compiler/interpreter to automatically optimize for parallel queries. These can be implemented as hints etc. to the compiler.

  5. Recompilers - you might want to write an interpreter/compiler that recompiles the initial queries into ones that can run in parallel for your target architecture. For example, for an N-core architecture, it might recompile a query into N-subqueries that execute in parallel, then combine the results.

I'm not sure that you should go into a great deal of research into standard optimization practices. These can be complex, and the subject of a lifetime of research in themselves. Since the object of the exercise is to take advantage of parallel processing, and meta-programming, that should be the focus of your research.

like image 22
Larry Watanabe Avatar answered Oct 21 '22 06:10

Larry Watanabe


Except for proprietary issues, how about optimizing MySQL in such ways? This is no trivial task though. Query optimization which takes advantage of parallel processing might be an entire term's work.

It's better to stand on the shoulders of giants to reach upward than to stand beside them.

like image 37
wallyk Avatar answered Oct 21 '22 06:10

wallyk