Designing a relational database and having a looming feeling of doom

Question

I'm part of a four-man team at a growing company of over 200 users. It's time for a massive refactoring of our proprietary software, and we're very excited to build an ideal system that we know can withstand growth for at least 5+ years. We're using a relational database, however, and while we're making some pretty great designs, I have a looming feeling that this product may be slower that we'd hope in the future.

My concern is our usage of foreign key relations. They're great for data integrity, which is why we're going with them. If we wanted to change someone's username, it would change it in all related spots. That's great. Problem is, we aren't - we're relating by their ID, so the only major benefit is the performance gained by having relational key's index.

All these indices piling up give me a red flag. We've got some tables that are simply linking tables, with three relational keys. They definitely have their place, and we're so confident that this cuts down on the queries we'll be doing. However, I then think - we have 10,000 rows in this, 10,000 in that, and 10,000 in the other, and we want to add a new one. Bam! New index * 4.

This is worrying. Are there any pitfalls we'll be falling into, any advice from seasoned individuals?

Mike Purcell · Accepted Answer

How fast is your current system? Designing a good database schema is the foundation of your entire application, and if I were to decide between speed and design, I would opt for design. There are a number of ways you can speed up your application which have nothing to do with the database itself.

If you conduct a parallel installation (running old system with new system), you can monitor slow query logs and head off any initial slowness problems in the early stages. You can also identify commonly run queries and optimize the queries by adding new or editing existing indices.

You can also implement a caching layer which will greatly speed up your application. Caching acts as a layer between your application and the database where you can store commonly requested information in a volatile, but quickly accessible state.

Another optimization technique is to scale up (increasing a single machines physical capacity) or scaling out (adding more machines in a cluster with replication). I have seen systems run extremely fast with 10 million+ records, on machines that had 64GB of ram. So ensure your design includes physical capacities.

There are a whole host of optimization techniques you can follow to ensure a speedy database; stay away from text columns, do not use OR operators, stay way from ORDER BY RAND(), and limit your use of grouping operators such as group by. These are just a few examples, so do some research. To make optimization easier you can use tools such as MySQL's explain, which will identify how painful a query may be when run through the application.

I'd highly recommend using Percona's MySQL build as they are highly optimized and offer custom features.

It sounds like you and your team are traveling down the right path, don't be too worried about designing a complex system. Some software applications require complex systems to operate. The real trick is to make complex systems easy to use so you can easily support it and grow it in the future. Good luck.

HLGEM · Answer

Do not discard the Fks unless you have to. There is close to a 100 % chance of bad data if you do this.

Fks will slow down inserts and deletes but they should do so. If people who have trillions of financial records can builddatabases that operate fast and still have the constraints so can you.

If you are worried about your database design working as volumn gets higher then hire a professional database designer. SOmeone who knows the real risks of elimianting keys and denormalizing and the kind of performance problems that are caused by bad design of a database.

Mike Sherrill 'Cat Recall' · Answer

My concern is our usage of foreign key relations. They're great for data integrity, which is why we're going with them. If we wanted to change someone's username, it would change it in all related spots. That's great. Problem is, we aren't - we're relating by their ID, so the only major benefit is the performance gained by having relational key's index.

A foreign key is more than the equivalent of cascading updates. Correctly used, foreign keys guarantee that only valid values can be entered in the column that references it.

If you let users enter "San Francisco, Alabama" for a city name, having a really fast database doesn't matter. Whatever you mail there won't be delivered. San Francisco isn't in Alabama.

Every data integrity constraint you can uncover needs to be declared to the dbms. It saves time and money. Lots of it.

Jonathan Day · Answer

There are some very smart and well-seasoned people in this industry who deliberately choose to discard referential integrity, transactions and the other "gold standards" of database design. eBay is one of those. Their design decisions are discussed by Martin Fowler (Software Engineering luminary) in this blog post

The moral should be (IMHO): Don't make assumptions, instead make prototypes and test! Prepare quantifiable tests to verify your design decisions before you are committed. There is a wealth of unit testing frameworks that would allow you to spin up prototypes and test rigs quickly.

A video with the same protagonists is here and another presentation here

Designing a relational database and having a looming feeling of doom

Tags:

database

indexing

mysql

foreign-keys

relational

Seth Goodwin

4 Answers

Mike Purcell

HLGEM

Mike Sherrill 'Cat Recall'

Jonathan Day

Recent Activity

Donate For Us

Designing a relational database and having a looming feeling of doom

Tags:

database

indexing

mysql

foreign-keys

relational

Seth Goodwin

4 Answers

Mike Purcell

HLGEM

Mike Sherrill 'Cat Recall'

Jonathan Day

Related questions

Recent Activity

Donate For Us