Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling large databases

I have been working in a web project(asp.net) for around six months. The final product is about to go live. The project uses SQL Server as the database. We have done performance testing with some large volumes of data, results show that performance degrades when data becomes too large, say 2 million rows (timeout issues, delayed reponses, etc). At first we were using fully normailized database, but now we made it partially normalized due to performance issues (to reduce joins). First of all, is it the right decision? Plus what are the possible solutions when data size becomes very large, as the no. of clients increase in future?

I would like to add further:

  • 2 million rows are entity tables, tables resolving the relations have much larger rows.
  • Performance degrades when data + no. of users increases.
  • Denormalization was done after identifying the heavily used queries.
  • We are also using some heavy amount of xml columns and xquery. Can this be the cause?
  • A bit off the topic, some folks in my project say that dynamic sql query is faster than a stored procedure approach. They have done some kind of performance testing to prove their point. I think the opposite is true. Some of the heavily used queries are dynamicaly created where as most of other queries are encapsulated in stored procedures.
like image 698
Umer Azaz Avatar asked Oct 03 '08 07:10

Umer Azaz


People also ask

Which software helps to manage large databases?

Oracle database is the most widely used object-relational database management software. The latest version of this tool is 12c where c means cloud computing. It supports multiple Windows, UNIX, and Linux versions.

What is considered a very large database?

Techopedia Explains Very Large Database (VLDB) VLDB is primarily an enterprise class database. Although there is no specific limitation of a VLDB, it can consist of billions of records and have a cumulative size in thousands of gigabytes, or even some hundred terabytes.


1 Answers

In the scheme of things, a few million rows is not a particulary large Database.

Assuming we are talking about an OLTP database, denormalising without first identifying the root cause of your bottlenecks is a very, very bad idea.

The first thing you need to do is profile your query workload over a representative time period to identify where most of the work is being done (for instance, using SQL Profiler, if you are using SQL Server). Look at the number of logical reads a query performs multiplied by the number of times executed. Once you have identified the top ten worst performing queries, you need to examine the query execution plans in detail.

I'm going to go out on a limb here (because it is usually the case), but I would be surprised if your problem is not either

  1. Absence of the 'right' covering indexes for the costly queries
  2. Poorly configured or under specified disk subsystem

This SO answer describes how to profile to find the worst performing queries in a workload.

like image 74
Mitch Wheat Avatar answered Oct 16 '22 17:10

Mitch Wheat