When building a transactional system that has a highly normalized DB, running reporting style queries, or even queries to display data on a UI can involve several joins, which in a data heavy scenario can and usually does, impact performance. Joins are expensive.
Often, the guidance espoused is that you should never run these queries off your transactional DB model, rather you should use a denormalized flattened model that is tailored for specific UI views or reports which eliminates the need for many joins. Data duplication is not an issue in this scenario.
This concept makes perfect sense, but what I rarely see when experts make these statements is exactly HOW to implement this. For example, (and quite frankly I'd appreciate an example using any platform) in a mid sized system running on a sql server back-end you have a normalized transactional model. You also have some reports and a website that require queries. So, you create a "reporting" database that flattens up the normalized data. How do you keep this in sync? Transaction log shipping? If so, how do you transform the data to fit in the reporting model?
Transactional databases overview Transactional databases are row-stores, which means that data is stored on disk as rows, rather than columns. Row-stores are great when you need to know everything about one customer in the user table since you can grab only the data you need.
If a transactional database system loses electrical power half-way through a transaction, the partially completed transaction will be rolled back and the database will be restored to the state it was in before the transaction started. This is a reporting database: A database used by reporting applications.
Steps in a TransactionLocate the record to be updated from secondary storage. Transfer the block disk into the memory buffer. Make the update to tuple in the buffer buffer. Write the modified block back out to disk.
Transactional data is information that is captured from transactions. It records the time of the transaction, the place where it occurred, the price points of the items bought, the payment method employed, discounts if any, and other quantities and qualities associated with the transaction.
In our shop, we set up a continuous transactional replication from the OLTP system to another DB server used for reporting. You wouldn't want to use log shipping for this purpose as it requires an exclusive lock on the database every time it restores a log, which would prevent your users from running reports.
With the optimizer in SQL Server today, I think the notion that the joins on a normalized database are "too expensive" for reporting is a bit outdated. Our design is fully 3rd normal form, several million rows in our main tables, and we have no problems running any of our reports. Having said that, if push came to shove, you could look into creating some indexed views on your reporting server to help out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With