Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do I need to know about working with huge databases?

I want to know what specific problems/solutions/advices/best-practices [don't punish me for the word] are arising while working with huge databases.

Under huge I imply databases, which have tables with millions of rows and/or databases with petabytes of data.

Platform-oriented answers will be great too.

like image 660
Dan Ganiev Avatar asked Sep 14 '10 18:09

Dan Ganiev


People also ask

What is considered a very large database?

VLDB is primarily an enterprise class database. Although there is no specific limitation of a VLDB, it can consist of billions of records and have a cumulative size in thousands of gigabytes, or even some hundred terabytes.

How can you improve your skills in making databases?

One way to improve your database management skills is to get certified. There are several different types of database certifications available, and each one can help you demonstrate your proficiency in different areas. Another way to improve your skills is to take an online course.


2 Answers

Some ideas

  • Learn the details of the specific database engine, how it works

  • How to optimize queries (hints, execution plans)

  • How to tune the database (not only indexes, but physical storage and representation, OS integration).

  • Query "tricks" like temporary tables to store temporary results that can be reused,

  • How to evaluate the necessity of denormalization for performance improvement

  • How to use profiling tools for the database, to identify the bottlenecks.

like image 156
Cătălin Pitiș Avatar answered Oct 14 '22 19:10

Cătălin Pitiș


There are two aspects of a database that are more important than size, as far as design and management goes.

The first is complexity. How many user tables are there? How many columns in those tables? A database with several hundred user tables in the schema and over a thousand columns in those tables is very complex. A database with a half a dozen tables is not very complex, even if it contains petabytes of data.

The second is scope of data sharing. If a database is built to share data among six or more applications, developed by separate programming teams, you should design and manage it very differently than you would a database that's embedded in a single application.

Most of the database questions asked in SO pertain to single application databases.

Here are a few things to learn, in addition to what's already been mentioned.

Learn the difference between table partition and table decomposition. Some people decompose tables into multiple tables all with the same columns, when partitioning would serve them better.

Learn the real difference between the graph model of data and the relational model of data. Some people design databases as if foreign keys were essentially the same as pointers. What they end up with is a system that captures all the slowness of a relational system and all the unmanageability of a graph system.

(Note: the graph model is often called the hiearachical or network model).

Designing a real relational database is much more subtle, and much more worthwhile, than designing a database that pretends to be modeled relationally but is really graph modeled.

like image 41
Walter Mitty Avatar answered Oct 14 '22 20:10

Walter Mitty