Let's say I am trying to build an opinion poll app, such that I can create a template of an opinion poll, give it multiple sections/questions, assign multiple people to different copies of a given question, create varying measures (happyness, succesfulness, greenness) and assign different questions different weights to apply to all of these measures.
Something like so:
CREATE TABLE users (
id SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE opinion_poll_templates (
id SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE opinion_poll_instances (
id SERIAL NOT NULL PRIMARY KEY,
template_id INTEGER NOT NULL REFERENCES opinion_poll_templates(id)
);
CREATE TABLE section_templates (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_templates(id)
);
CREATE TABLE section_instances (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_instances(id),
template_id INTEGER NOT NULL REFERENCES section_templates(id)
);
CREATE TABLE question_templates (
id SERIAL NOT NULL PRIMARY KEY,
section_id INTEGER NOT NULL REFERENCES section_templates(id)
);
CREATE TABLE measure_templates (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_templates(id)
);
CREATE TABLE answer_options (
id SERIAL NOT NULL PRIMARY KEY,
question_template_id INTEGER NOT NULL REFERENCES question_templates(id),
weight FLOAT8
);
CREATE TABLE question_instances (
id SERIAL NOT NULL PRIMARY KEY,
template_id INTEGER NOT NULL REFERENCES question_templates(id),
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_instances(id),
section_id INTEGER NOT NULL REFERENCES section_instances(id),
answer_option_id INTEGER NOT NULL REFERENCES answer_options(id),
contributor_id INTEGER
);
CREATE TABLE measure_instances (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_instances(id),
template_id INTEGER NOT NULL REFERENCES measure_templates(id),
total_score INTEGER
);
CREATE TABLE scores (
id SERIAL NOT NULL PRIMARY KEY,
question_template_id INTEGER NOT NULL REFERENCES question_templates(id),
measure_template_id INTEGER NOT NULL REFERENCES measure_templates(id),
score INTEGER NOT NULL
);
Now let's say I am interested in the per measureInstance (per measure assigned to an opinion poll) cross question, cross user average?
WITH weighted_score AS (
SELECT AVG(answer_options.weight), measure_instances.id
FROM question_instances
INNER JOIN answer_options ON question_instances.template_id = answer_options.question_template_id
INNER JOIN scores ON question_instances.template_id = scores.question_template_id
INNER JOIN measure_instances ON measure_instances.template_id=scores.measure_template_id
WHERE measure_instances.opinion_poll_id = question_instances.opinion_poll_id
GROUP BY measure_instances.id
)
UPDATE measure_instances
SET total_score=(SELECT avg FROM weighted_score
WHERE weighted_score.id = measure_instances.id)*100
RETURNING total_score;
This seems to not only not group as expected, but produced incorrect results.
Why is the result an integer rather then a float? Why is the result not being grouped by measure instance instead being identical across all? And why is the result incorrect for any of them?
A demonstration: http://sqlfiddle.com/#!15/dcce8/1
EDIT: In working through explaining exactly what I wanted, I realized the source of my problem was that I was simply adding percentages, rather then normalizing across questions as a percentage.
My new and improved sql is:
WITH per_question_percentage AS (
SELECT SUM(answer_options.weight)/COUNT(question_instances.id) percentage, question_templates.id qid, opinion_poll_instances.id oid
FROM question_instances
INNER JOIN answer_options ON question_instances.answer_option_id = answer_options.id
INNER JOIN question_templates ON question_templates.id = question_instances.template_id
INNER JOIN opinion_poll_instances ON opinion_poll_instances.id = question_instances.opinion_poll_id
GROUP BY question_templates.id, opinion_poll_instances.id
), max_per_measure AS (
SELECT SUM(scores.score), measure_instances.id mid, measure_instances.opinion_poll_id oid
FROM measure_instances
INNER JOIN scores ON scores.measure_template_id=measure_instances.template_id
GROUP BY measure_instances.id, measure_instances.opinion_poll_id
), per_measure_per_opinion_poll AS (
SELECT per_question_percentage.percentage * scores.score score, measure_instances.id mid, measure_instances.opinion_poll_id oid
FROM question_instances
INNER JOIN scores ON question_instances.template_id = scores.question_template_id
INNER JOIN measure_instances ON measure_instances.template_id = scores.measure_template_id
INNER JOIN max_per_measure ON measure_instances.id = max_per_measure.mid
INNER JOIN per_question_percentage ON per_question_percentage.qid = question_instances.template_id
WHERE measure_instances.opinion_poll_id = question_instances.opinion_poll_id AND question_instances.opinion_poll_id = per_question_percentage.oid
GROUP BY measure_instances.id, measure_instances.opinion_poll_id, per_question_percentage.percentage, scores.score
)
UPDATE measure_instances
SET total_score = subquery.result*100
FROM (SELECT SUM(per_measure_per_opinion_poll.score)/max_per_measure.sum result, per_measure_per_opinion_poll.mid, per_measure_per_opinion_poll.oid
FROM max_per_measure, per_measure_per_opinion_poll
WHERE per_measure_per_opinion_poll.mid = max_per_measure.mid
AND per_measure_per_opinion_poll.oid = max_per_measure.oid
GROUP BY max_per_measure.sum, per_measure_per_opinion_poll.mid, per_measure_per_opinion_poll.oid)
AS subquery(result, mid, oid)
WHERE measure_instances.id = subquery.mid
AND measure_instances.opinion_poll_id = subquery.oid
RETURNING total_score;
Is this canonical sql? Is there anything I should be aware of with this kind of CTE chaining (or otherwise)? Is there a more efficient way to achieve the same thing?
This is a bit long for a comment.
I don't understand the questions.
Why is the result an integer rather then a float?
Because measure_instances.total_score
is an integer and that is what the returning
clause is returning.
Why is the result not being grouped by measure instance instead being identical across all?
When I run the CTE independently, the values are 0.45. The data and logic dictate the same values.
And why is the result incorrect for any of them?
I think you mean "for all of them". In any case, the results look correct to me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With