Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does database query and insert speed depend on?

At my work we have a small database (as in two hundred tables and maybe a total of a million of rows or so).

I've always expected it to be quite fast in the order of several ten of thousands insertion per second and with querys taking milliseconds once the connection is established.

Quite the contrary we are having some performance problems so that we only get a couple of hundred insertions per second and querys, even the simplest ones would take for ever.

I'm not enterly sure if that's the standar behavior/performance or we're doing something wrong. For example, 1500 querys which imply joining 4 tables on a single key column take around 10 seconds. It takes 3 minutes to load 300K of data in xml format into the database using simple inserts without violating any constraints.

The database is SQL Server 2005 and has a rich relational dependency model, meaning a lot of relations and categorizations over the data as well as a full set of check constraints for the categorization codes and several other things.

Are those times right? If not, what could be affecting performance? (All queries are done on indexed columns)

like image 973
Jorge Córdoba Avatar asked Aug 25 '09 22:08

Jorge Córdoba


3 Answers

To have a rough comparison: the TPC-C benchmark record for SQL Server is at around 1.2 mil transactions per minute, and is been like this over last 4 years or so (caped by the 64 CPU OS limit). That is something in the balpark of ~16k transactions per second. This is on super high end machines, 64 CPUs, plenty of RAM, affinitized clients per NUMA node and a serverly short stripped I/O system (only about like 1-2% of each spindle is used). Bear in mind those are TPC-C transactions, so they consist of several operations (I think is 4-5 reads and 1-2 writes each in average).

Now you should scale down this top of the line hardware to your actual deployment and will get the ballpark where to set your expectations for overal OLTP transaction processing.

For data upload the current world record is about 1TB in 30 minutes (if is still current...). Several tens of thousands of inserts per second is quite ambitious, but achievable, when properly done on serious hardware. The article in the link contains tips and tricks for ETL high troughput (eg. use multiple upload streams and affinitize them to NUMA nodes).

For your situation I would advise first and foremost measure so you find out the bottlenecks and then ask specific questions how to solve specific botlenecks. A good starting point is the Waits and Queues whitepaper.

like image 98
Remus Rusanu Avatar answered Sep 25 '22 08:09

Remus Rusanu


Indexing is a major factor here, when done properly they can speed up Select statements quite well, but remember that an index will bog down an insert as well as the server not only updates the data, but the indexes as well. The trick here is:

1) Determine the queries that are truly speed critical, these queries should have optimal indexes for them.

2) Fill factor is important here as well. This provides empty space to an index page for filling later. When an index page is full (enough rows are inserted), a new page needs to be created taking yet more time. However empty pages occupy disk space.

My trick is this, for each application I set priorities as follows:

1) Speed of read (SELECT, Some UPDATE, Some DELETE) - the higher this priority, the more indexes I create
2) Speed of write (INSERT, Some Update, Some DELETE) - the higher this priority, the fewer indexes I create
3) Disk space efficiency - the higher this priority, the higher my fill factor

Note this knowledge generally applies to SQL Server, your mileage may vary on a different DBMS.

SQL Statement evaluation can help here too, but this takes a real pro, careful WHERE and JOIN analysis can help determine bottlenecks and where your queries are suffering. Turn on SHOWPLAN and query plans, evaluate what you see and plan accordingly.

Also look at SQL Server 2008, indexed Joins!

like image 26
tekiegreg Avatar answered Sep 25 '22 08:09

tekiegreg


A "Rich Relational Dependency" model is not conducive to fast insert speeds. Every constraint (primary key, value checks, and especially foreign keys), must be checked for every inserted record. Thats a lot more work than a "simple insert".

And it doesn't mtter that your inserts have no constraint violations, the time is probably going to be all in checking your foreign keys. Unless you have triggers also, because they're even worse.

Of course is it possible that the only thing that is wrong is that your Insert table is the parent-FK for a must-have-children" FK relation for another table tha forgot to add an index for the child-FK side on the FK relation (this is not automatic and is often forgotten). Of course, that's just hoping to get lucky. :-)

like image 35
RBarryYoung Avatar answered Sep 23 '22 08:09

RBarryYoung