Are staging tables an anti-pattern that is used when rpc (such as Java RMI or some kind of Web Service call) or messaging queue (such as JMS) would be a better solution, or are there problems better served by staging tables?
To clarify:
By staging tables I mean those cases where records are appended to a table or tables by a process which is then read by and acted on by second process or processes. I am not referring to tables which tables which are meant to reflect end of interval status (end of day, end of pay period etc). In most cases, the schema of the staging tables closely mimics an application data type(s) such as customer or account.
Potential causes for this anti-pattern:
1) Business Unit Wall between owners of the two processes prevents process that writes to or reads from staging being modified.
2) Low confidence in process that writes to or reads from staging leads developers to use table to prevent data loss "in case something fails"
3) Lack of knowledge or DGAS (don't give a ^%$@) attitude
Staging tables, as you describe are an essential part of most data warehouse or BI environments. You could argue that reliable/resilient rpc would do the same job, but I think you'd be incorrect.
By pulling data to a staging table, you're moving it out of the production environment, potentially to do further calculation, summary, re-index, re-keying and so on, the majority of these are acheived 'in database'. Replacing this with an RPC you're moving the code and CPU cycles out of the DB and into an app server for no real benefit. For instance an app server has a much higher chance of crashing - you can't (easily) rollback an RPC.
Of course there are many ways of moving data reliably between systems, staging tables just happen to be one of the easiest, most performant, reliable and in development terms cheapest, doesn't always mean they're the right approach - but more often than not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With