I want to know what specific problems/solutions/advices/best-practices [don't punish me for the word] are arising while working with huge databases.
Under huge I imply databases, which have tables with millions of rows and/or databases with petabytes of data.
Platform-oriented answers will be great too.
VLDB is primarily an enterprise class database. Although there is no specific limitation of a VLDB, it can consist of billions of records and have a cumulative size in thousands of gigabytes, or even some hundred terabytes.
One way to improve your database management skills is to get certified. There are several different types of database certifications available, and each one can help you demonstrate your proficiency in different areas. Another way to improve your skills is to take an online course.
Some ideas
Learn the details of the specific database engine, how it works
How to optimize queries (hints, execution plans)
How to tune the database (not only indexes, but physical storage and representation, OS integration).
Query "tricks" like temporary tables to store temporary results that can be reused,
How to evaluate the necessity of denormalization for performance improvement
How to use profiling tools for the database, to identify the bottlenecks.
There are two aspects of a database that are more important than size, as far as design and management goes.
The first is complexity. How many user tables are there? How many columns in those tables? A database with several hundred user tables in the schema and over a thousand columns in those tables is very complex. A database with a half a dozen tables is not very complex, even if it contains petabytes of data.
The second is scope of data sharing. If a database is built to share data among six or more applications, developed by separate programming teams, you should design and manage it very differently than you would a database that's embedded in a single application.
Most of the database questions asked in SO pertain to single application databases.
Here are a few things to learn, in addition to what's already been mentioned.
Learn the difference between table partition and table decomposition. Some people decompose tables into multiple tables all with the same columns, when partitioning would serve them better.
Learn the real difference between the graph model of data and the relational model of data. Some people design databases as if foreign keys were essentially the same as pointers. What they end up with is a system that captures all the slowness of a relational system and all the unmanageability of a graph system.
(Note: the graph model is often called the hiearachical or network model).
Designing a real relational database is much more subtle, and much more worthwhile, than designing a database that pretends to be modeled relationally but is really graph modeled.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With