I have a system that holds some big amount of data. The database used is SQL Server. One of the tables have around 300000 rows, and there are quite a few number of tables of this size. There happens regular updates on this table - we say this as "transactional database" where transactions are happening.
Now, we need to implement a reporting functionality. Some of the architect folks are proposing a different database which is a copy of this database + some additional tables for reporting. They propose this because they do not want to disrupt the transactional database functionality. For this, data has to be moved to the reporting database frequently. My question here is, is it really required to have second database for this purpose? Can we use the transactional database itself for reporting purposes? Since the data has to be moved to a different database, there will be latency involved which is not the case if the transactional database itself is used for reporting. Expecting some expert advice.
SQL databases provide great benefits for transactional data whose structure doesn't change frequently (or at all) and where data integrity is paramount. It's also best for fast analytical queries. NoSQL databases provide much more flexibility and scalability, which lends itself to rapid development and iteration.
Oracle, DB2, Microsoft SQL Server, Microsoft Access, MySQL are the popular relational database nowadays. They are easy to use and maintain. Database reporting tools rely on connections to a relational database management system via JDBC, JNDI or ODBC.
All four ACID qualities are enforced through transactions: Atomicity, Consistency, Isolation, and Durability: Transactional Database: Atomicity. Transactional Database: Consistency. Transactional Database: Isolation.
Examples of Transactional Data Some examples include: Financial transactional data: insurance costs and claims data, or a purchase or sale; Deposits or withdrawals in case of banks. Logistical transactional data: shipping status, shipping partner data. Work-related transactional data: employee hours tracking.
You need to do some research into ETLs, Data Warehousing and Reporting databases, as I think your architects may be addressing this in a good way. Since you don't give details of the actual reports I'll try and answer the general case.
(Disclaimer: I work in this field and we have products geared to this)
Transactional databases are optimised for a good balance between read/update/insert, and the indexes and table normalisations are geared to this effect.
Reporting databases are geared to be very very optimal for read access over and above all other things. This means that the 'normal' normalisation rules that one would apply to a transactional database won't apply. In fact high degrees of de-normalisation may be in place to make the report queries way more efficient and simpler to manage.
Running complex (especially aggregations over extended data ranges such as historical time frames) queries on transactional database, may impact the performance such that the key users of the database - the transaction generators could be negatively impacted.
Though a reporting database may not be required in your situation you may find that the it's simpler to keep the two use cases separate.
Your concern about the data latency is a real one. This can only be answered by the business users who will consume the reports. Often people say "We want real time info" when in fact lots if not all of their requirements are covered with non real time info. The acceptable degree of data staleness can only be answered by them
In fact I'd suggest that you take your research slight further and look at multidimensional cubes for your report concerns as opposed just reporting databases. There are designed abstract your reporting concerns to whole new level.
I second Hubson's answer. I myself may not a decent sql server developers, but I have faced with big tables (around 1m rows). So more or less I have the experience for this.
Referencing to this SE answer, I can say that multiple DB on same harddisk won't give performance boost due to I/O capacity of harddisk. If you can somehow put the reporting DB to different harddisk, then you can gain the benefit by having one hdd intensive on I/O
, and other in read only
.
And if both databases exists in same instance, it shares the same memory
and tempdb
, which gives no benefit to performance or reducing I/O cost at all.
Moreover, 300k rows is not a big deal, unless it is joined with 3 other 300k tables, or having a very complex query that requires data cleanup, etc. It is different though if your data growth rate is increasing fast in the future.
What you can do to increase the performance of report, without having involving the performance impact for operational db?
Proper indexing
Beside requiring some storage, proper indexing can lead to faster data processing and you will be amazed with how it speed up processes.
Proper locking
NoLock
imho is the best to use for reporting, unless you use different locking strategy than serialized one in database. Some skew in report result caused by uncommitted transaction usually not matter much.
Summarize data
A scheduled process to generate summarized data can also be used to prevent re-calculation for report reading.
Edit:
So, what is the benefit of having the second database? It is beneficial though to has it, even though does not give direct benefit to performance. Second database can be used to keep the transaction db clean and separated with reporting activity. Its benefits:
Keeping the materialized data
For example a summary of total profit generated each month can be stored in table which belong to this specific db
Keeping the reporting logics
You can secure access for specific people which is different with transactional db
The file generated for db is separated with transactional. It is easier for backup/restore (and separating with transactional) and when you want to move to different harddisk, then it is easier
In short, adding another normal database for this situation will not give much benefit in performance, unless it is done right (separate the harddisk, separate the server, etc). However second database gives benefit in maintainability aspects and security strategies though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With