I have a PostgreSQL database containing various education data such school-level test scores and enrollment figures. I need to separate enrollment from test scores because the data is on different grains. Even though enrollment is on a different granularity from the test-score data, many of the dimensions are the same. For example, I have:
~ ---------------------------------------------------------------------------------~
| Test Scores Fact |
|-------------|-----------|----------|-----------|--------------|------------|-----|
| school_code | test_code | grade_id | gender_id | ethnicity_id | subject_id | ... |
|-------------|-----------|----------|-----------|--------------|------------|-----|
~ --------------------------------------------------------~
| Enrollment Fact |
|-------------|----------|-----------|--------------|-----|
| school_code | grade_id | gender_id | ethnicity_id | ... |
|-------------|----------|-----------|--------------|-----|
This structure is fine on the backend, but in Qlikview, this creates a synthetic key. The solution for synthetic keys seems to usually be replacing it with a link table via Qlikview scripting, which has been my approach as well. But this does not seem to scale, as when I add a third fact table (on yet another grain) that contains more of the same dimensions, if I create another link table, now my two link tables start to associate as they contain several commonly named fields, and Qlikview's response is to create more synthetic keys?
I'm relatively new to Qlikview and am working by myself. How are multiple facts of different grains with common dimensions usually handled?
EDIT:
I've provided my solution to this problem which has been working in a production environment for just under a year! See my answer below...
Seeing the popularity of this question, I'm going to add my actual solution to the mix so people have an example to work from, which for some reason is really hard to find for such a common problem...
I proceeded with creating a Link Table. This solution still to this day feels like a hack as it creates one huge table containing the Cartesian product of every one of your keys in all of your fact tables... but it does work.
The problem: You have multiple fact tables in your database; an occurrence in almost every database ever. Some (or all) of these fact tables share the same key fields; no problem, right? Wrong. Unfortunately, due to Qlik's associative nature, instead of each one of your fact tables linking nicely to their lookup tables, your fact tables are now associating with each other and wreaking havoc on your data model; creating circular references and untold amounts of synthetic keys.
The Solution: Create a Link Table. Sounds simple, right? Well, it is, but it's also very poorly documented and hard to understand without an initial explanation. You might be wondering... what's a Link Table? It is the Cartesian product of all keys from all of your fact tables. How does this correct the problem? It removes all of the unwanted associations between your fact tables as each will now only contain a single unique concatenated key. Those unique keys will associate with the Link Table only, which contains all your unique concatenated keys as well as all the individual keys. The Link Table will subsequently associate with your lookup tables and all will be well.
Implementation:
This implementation will use the two tables contained in my question above; test_scores_fact
and enrollment_fact
.
test_scores_fact | enrollment_fact | school | gender | ...
---------------- | --------------- | ------ | ------ | ---
school_code (FK) | school_code (FK) | school_code (PK) | gender_id (PK) |
test_code (FK) | grade_id (FK) | school_name (FK) | gender_desc |
grade_id (FK) | ethnicity_id (FK) | address | ... |
gender_id (FK) | gender_id (FK) | ... |
ethnicity_id (FK) | number_enrolled (F) |
subject_id (FK) |
test_score (F) |
FK = Foreign Key
PK = Primary Key
F = Fact
As you can see, the two fact tables have overlapping keys, school_code
, grade_id
, gender_id
, and ethnicity_id
. In a relational model, each key field has a corresponding table with additional information about the key. This model does not jive with Qlikview's associative nature as Qlikview associates tables based on the name of a field; even when you don't want it to. You do want like named fields to associate to their lookup tables, however you don't want like named fields in your fact tables to associate. Unfortunately you cannot stop this behavior. You must implement a Link Table...
In your Qlikview script, create a temporary fact table, which loads in all fields from your database table:
[temp_test_scores]:
LOAD school_code,
test_code,
grade_id,
gender_id,
ethnicity_id,
subject_id,
test_score;
SQL SELECT * FROM <database connection>
Concatenate your keys and remove all individual keys:
[test_scores]:
LOAD school_code & '_' test_code & '_' grade_id & '_' gender_id & '_' ethnicity_id & '_' subject_id as test_key,
test_score
RESIDENT [temp_test_scores];
Repeat Steps 1 & 2 for each fact table:
[temp_enrollment]:
LOAD school_code,
grade_id,
ethnicity_id,
gender_id,
number_enrolled;
SQL SELECT * FROM <database connection>
[enrollment]:
LOAD school_code & '_' & grade_id & '_' & ethnicity_id & '_' & gender_id as enrollment_key,
number_enrolled
RESIDENT [temp_enrollment];
Create your Link Table by concatenating your individual keys into a single table:
[temp_link_table]:
LOAD DISTINCT
school_code,
test_code,
grade_id,
gender_id,
ethnicity_id,
subject_id
RESIDENT [temp_test_scores];
CONCATENATE ([temp_link_table])
LOAD DISTINCT
school_code,
grade_id,
ethnicity_id,
gender_id,
number_enrolled
RESIDENT [temp_enrollment];
/**
* The final Link Table will contain all of the individual keys one time as well as your concatenated keys
*/
[link_table]:
LOAD DISTINCT
school_code,
test_code,
grade_id,
gender_id,
ethnicity_id,
subject_id,
school_code & '_' test_code & '_' grade_id & '_' gender_id & '_' ethnicity_id & '_' subject_id as test_key,
school_code & '_' & grade_id & '_' & ethnicity_id & '_' & gender_id as enrollment_key
RESIDENT [temp_link_table]
Drop your temp tables so they do not appear in your data model:
DROP TABLE [temp_test_scores];
DROP TABLE [temp_enrollment];
DROP TABLE [temp_link_table];
This will remove all associations between your fact tables as there now exist no common field names between them. Each fact table will link to the link table via the created concatenated key. The link table will then associate with each individual lookup table. Your Qlikview data model will not contain any synthetic keys or circular references.
If you create another fact table in the future, just follow steps 1 & 2 again, and add any new individual keys to the Link Table and also add the new concatenated key to the Link Table as well. It scales with little effort.
Good Luck!
There are two main strategies to modelling data in QlikView to handle multiple fact tables:
Append your fact tables into one single fact table - usually referred to as a CONCATENATED FACT as QlikView's syntax for appending data to tables is by use of the CONCATENATE prefix (the equivalent of a SQL UNION operation)
Build a link table (what you have done so far) For a majority of implementations, option 1 is the appropriate method. Attributes of a CONCATENATED fact can be summarised as:
Positives:
Negatives:
How to construct Link Tables is a complex subject but relies upon traditional database linking table design techniques. It is easy to go wrong and produce linking tables that may seem to produce the correct results in the front-end but is excessively large, consuming memory and CPU resources.
In my experience, a poorly modelled QlikView data model is the most common culprit for causing poor performance.
I hope this quick, far from exhaustive, introduction to multi-fact modelling in QlikView proves of some help and sets you on the right course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With