I want to know what specific problems/solutions/advices/best-practices [don't punish me for the word] are arising while working with huge databases. Under huge I imply databases, which have tables with millions of rows and/or databases with petabytes of data. Platform-oriented answers will be great too.

Some ideas <ul> <li>Learn the details of the specific database engine, how it works</li> <li>How to optimize queries (hints, execution plans)</li> <li>How to tune the database (not only indexes, but physical storage and representation, OS integration). </li> <li>Query "tricks" like temporary tables to store temporary results that can be reused, </li> <li>How to evaluate the necessity of denormalization for performance improvement</li> <li>How to use profiling tools for the database, to identify the bottlenecks.</li> </ul>

What do I need to know about working with huge databases?

2 Answers

Some ideas

Learn the details of the specific database engine, how it works
How to optimize queries (hints, execution plans)
How to tune the database (not only indexes, but physical storage and representation, OS integration).
Query "tricks" like temporary tables to store temporary results that can be reused,
How to evaluate the necessity of denormalization for performance improvement
How to use profiling tools for the database, to identify the bottlenecks.

156

answered Oct 14 '22 19:10

Cătălin Pitiș

There are two aspects of a database that are more important than size, as far as design and management goes.

The first is complexity. How many user tables are there? How many columns in those tables? A database with several hundred user tables in the schema and over a thousand columns in those tables is very complex. A database with a half a dozen tables is not very complex, even if it contains petabytes of data.

The second is scope of data sharing. If a database is built to share data among six or more applications, developed by separate programming teams, you should design and manage it very differently than you would a database that's embedded in a single application.

Most of the database questions asked in SO pertain to single application databases.

Here are a few things to learn, in addition to what's already been mentioned.

Learn the difference between table partition and table decomposition. Some people decompose tables into multiple tables all with the same columns, when partitioning would serve them better.

Learn the real difference between the graph model of data and the relational model of data. Some people design databases as if foreign keys were essentially the same as pointers. What they end up with is a system that captures all the slowness of a relational system and all the unmanageability of a graph system.

(Note: the graph model is often called the hiearachical or network model).

Designing a real relational database is much more subtle, and much more worthwhile, than designing a database that pretends to be modeled relationally but is really graph modeled.

answered Oct 14 '22 20:10

Walter Mitty

Related questions
                            
                                How to convert Visual Foxpro database into SQL Server database
                            
                                Cast string+ntext to nvarchar error
                            
                                Check if a variable contains any non-numeric digits in SQL Server
                            
                                An exception of type 'System.Data.SqlClient.SqlException' occurred in System.Data.dll
                            
                                Joining All Rows of Two Tables in SQL Server
                            
                                Postgresql : How do I select top n percent(%) entries from each group/category
                            
                                the use of quote_ident() in a plpgsql function
                            
                                Atomic Read and Write with Entity Framework
                            
                                Can Dapper be used to update and insert models?
                            
                                How to catch SQL Exception in Laravel 5
                            
                                Using distinct with stuff/for xml path('')
                            
                                How to remove the quotes from a string for SQL query in Python?
                            
                                SQL Command for copying table
                            
                                Moving from ints to GUIDs as primary keys
                            
                                In SQL Server 2005, how do I set a column of integers to ensure values are greater than 0?
                            
                                oracle procedure returns integer
                            
                                Beginner SQL question: arithmetic with multiple COUNT(*) results
                            
                                Rails Postgres functional indexes
                            
                                What type of data structure should I use to hold table rows?
                            
                                Best data type for storing strings in SQL Server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What do I need to know about working with huge databases?

Tags:

sql

database

database-design

bigdata

Dan Ganiev

People also ask

2 Answers

Cătălin Pitiș

Walter Mitty

Recent Activity

Donate For Us