I have just started internship at a small software house and I am working on an ERP. My team lead has forbidden me to create any relationships in the database. As this is my internship I was shocked because till now I have read that relationships are necessary for ensuring data integrity. My team lead told me that we can enforce the data integrity at the front end. After some research I found out that foreign keys do make the db slower but Indexing foreign keys can increase performance.
Questions
- How making foreign keys cost performance?
- Does ensuring data integrity at the front end not cost performance ? If yes then what will be difference between the performance cost by database foreign keys and cost by ensuring data integrity rules at the front end?
- If database foreign keys do make the database slower, and integrity rules at the application layer is a better approach then why do our relational databases allowed to have foreign keys at all?
- After doing some research and reading that relationships do make db slower I tried to think of a scenario where ensuring data integrity at application layer is not possible but I could not think of one, If someone can explain this, it would be great.
-
If indexing foreign keys increase performance then what would be better out of the two below:
1.) Ensuring data integrity rules at application layer
2.) Indexing Foreign Keys
Thanks for help.
In general, the more complex your data model becomes, the greater the hit to performance you will experience. However, unless your database is very large, your hardware resources very minimal, or your queries very complex, you probably will not be hindered by adding enforced relationships in your database. That is obviously a subjective statement, but "acceptable performance" is a very subjective concept that will vary from project-to-project.
The heart of your colleague's argument is correct, though, and here are a few reasons why:
- Every time you write a new record containing a foreign or primary key, the database must check that none of the keys' constraints are violated. Key columns are also indexed, so indexes must be updated when records are added.
- Every time you delete a record containing or referenced-by a foreign key, constraints are checked and the deletion may cascade to referenced tables. Indexes must also be updated when records are deleted.
- CRUD operations of all kind slow significantly as more and more tables are joined in the queries. The larger the tables, the more records that must be joined, and the slower the execution.
That said, here is why those arguments mostly don't matter:
- Indexing significantly cuts down query execution time, especially if implemented well. It is important to index tables in a way that takes advantage of the structure of the queries that will be run against it.
- Unless your database hardware is bare-bones, the operations needed to enforce data integrity and relationship constraints will probably run much faster on the back end than the front end. This is especially true if the constraint checks are happening in a client application than on a server.
- Client-based data integrity checks are much more error-prone than database constraints. Yes, if your code is perfect it will run just as well, but RDBMS software is designed for this type of thing and it is incredibly simple to implement.
- Client-based data integrity checks could lead to data synchronization problems. Think two people at different locations trying to modify a unique record. But perhaps eventual data concurrency will suffice if lightening-quick speed is your primary concern.
These all depend upon your RDBMS and project's specifications, but are good rules of thumb. In general, I would say that unless your database is so large that enforcing relationships becomes prohibitively slow, or your model is so simple that relationships are pointless (in which case, why are you using an RDBMS?), it is better to enable data integrity and relationship constraints.