Let's say I wanted to make a database that could be used to keep track of bank accounts and transactions for a user. A database that can be used in a Checkbook application.
If i have a user table, with the following properties:
And then I create an account table, which can be linked to a certain user:
And to go the next step, I create a transaction table:
Is having the user_id in the transaction table a good option? It would make the query cleaner if I wanted to get all the transactions for each user, such as:
SELECT * FROM transactions
JOIN users ON users.user_id = transactions.user_id
Or, I could just trace back to the users table from the account table
SELECT * FROM transactions
JOIN accounts ON accounts.account_id = transactions.account_id
JOIN users ON users.user_id = accounts.user_id
I know the first query is much cleaner, but is that the best way to go?
My concern is that by having this extra (redundant) column in the transaction table, I'm wasting space, when I can achieve the same result without said column.
Let's look at it from a different angle. From where will the query or series of queries start? If you have customer info, you can get account info and then transaction info or just transactions-per-customer. You need all three tables for meaningful information. If you have account info, you can get transaction info and a pointer to customer. But to get any customer info, you need to go to the customer table so you still need all three tables. If you have transaction info, you could get account info but that is meaningless without customer info or you could get customer info without account info but transactions-per-customer is useless noise without account data.
Either way you slice it, the information you need for any conceivable use is split up between three tables and you will have to access all three to get meaningful information instead of just a data dump.
Having the customer FK in the transaction table may provide you with a way to make a "clean" query, but the result of that query is of doubtful usefulness. So you've really gained nothing. I've worked writing Anti-Money Laundering (AML) scanners for an international credit card company, so I'm not being hypothetical. You're always going to need all three tables anyway.
Btw, the fact that there are FKs in the first place tells me the question concerns an OLTP environment. An OLAP environment (data warehouse) doesn't need FKs or any other data integrity checks as warehouse data is static. The data originates from an OLTP environment where the data integrity checks have already been made. So there you can denormalize to your hearts content. So let's not be giving answers applicable to an OLAP environment to a question concerning an OLTP environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With