I'm trying to model my data warehouse using a star schema but I have a problem to avoid joins between fact tables.
To give a trivial idea of my problem, I want to collect all the events who occur on my operating system. So, I can create a fact table event
with some dimensions like datetime
or user
. The problem is I want to collect different kinds of event: hardware event and software event.
The problem is those events have not the same dimensions. By instance, for a hardware event, I can have physical_component
or related_driver
dimensions and, for a software event, software_name
or online_application
dimensions (that is just some examples, the idea to keep in mind is the fact event
can be specialized into some specific events with specific dimensions).
In a relational model, I would have 3 tables organized like that:
The problem is : how to handle joins between fact tables in a star schema?
JOIN
SQL statement in all of our queries.
NULL
value.
event
and in its own table. Because we will store a huge quantity of data, I'm not sure this duplication is a good idea.
From your description and your subsequent comments to other answers, I'd say that option 2 or option 4 are the right way to model things from a dimensional modelling perspective. Each fact should be a measure of a business process, and the dimensionality of software and hardware events seems to be sufficiently different that they warrant being stored separately.
Then, there's a case for also storing the separate events table as a view, materialised view, or plain-ol' table storing the things that are common.
Once you've decided that's the right way to model things 'logically', you then need to balance performance, maintainability, usability and storage. For dimensional modelling, usability and performance of queries take top priority (otherwise you may as well not use a dimensional model at all), and the extra work in ETL, and extra space needed, are prices worth paying.
A non-materialised view would save you the space at the price of performance, but it could be that you could give it a sufficiently awesome index or two that would mitigate that. A materialised view will give you performance at the price of storage.
I'd be tempted to create the two fact tables with indexes and a non-materialised view, and see what performance of that is like before taking further performance enhancing steps. 10 million fact rows isn't so bad, it might still perform.
A materialized view can be queried directly. But if you want to, you can use the query rewrite capabilities of Oracle so that the Materialized view is instead used as a performance-enhancer, like an index, when you're querying the original tables. See here for details: http://www.sqlsnippets.com/en/topic-12918.html Whether you choose to use it in query rewrite mode or just as a view in its own right depends on whether you want the users to know about this extra table, or for it to just sit in the background as a helpful friend.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With