I have a database table (Oracle 11g) of questionnaire feedback, including multiple choice, multiple answer questions. The Options column has each value the user could choose, and the Answers column has the numerical values of what they chose.
ID_NO OPTIONS ANSWERS
1001 Apple Pie|Banana-Split|Cream Tea 1|2
1002 Apple Pie|Banana-Split|Cream Tea 2|3
1003 Apple Pie|Banana-Split|Cream Tea 1|2|3
I need a query that will decode the answers, with the text versions of the answers as a single string.
ID_NO ANSWERS ANSWER_DECODE
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
I have experimented with regular expressions to replace values and get substrings, but I can't work out a way to properly merge the two.
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
id_no,
options,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option,
answers,
REGEXP_REPLACE(answers, '(\d)+', ' \1 ') answer_numbers,
REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???"
FROM feedback
I don't want to have to manually define or decode the answers in the SQL; there are many surveys with different questions (and differing numbers of options), so I'm hoping that there's a solution that will dynamically work for all of them.
I've tried to split the options and answers into separate rows by LEVEL, and re-join them where the codes match, but this runs exceedingly slow with the actual dataset (a 5-option question with 600 rows of responses).
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
answer_rows.id_no,
ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl)
FROM
(SELECT DISTINCT
LEVEL lvl,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer
FROM
(SELECT DISTINCT
options,
REGEXP_COUNT(options||'|', '(.)+?\|') num_choices
FROM
feedback)
CONNECT BY LEVEL <= num_choices
) option_rows
LEFT OUTER JOIN
(SELECT DISTINCT
id_no,
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer
FROM
(SELECT DISTINCT
id_no,
answers,
To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer
FROM
feedback)
WHERE
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL
CONNECT BY LEVEL <= max_answer
) answer_rows
ON option_rows.lvl = answer_rows.answer
GROUP BY
answer_rows.id_no
ORDER BY
answer_rows.id_no
If there isn't a solution just using Regex, is there a more efficient way than LEVEL to split the values? Or is there another approach that would work?
It's slow because you're expanding each row too many times; the connect-by clauses you're using are looking across all the rows, so you're ending up with a huge amount of data to then sort - which is presumably why you ended up with the DISTINCT
in there.
You can add two PRIOR
clauses to the connect-by, firstly so the ID_NO
is preserved, and a second to avoid a loop - any non-deterministic function will do for this, I've picked dbms_random.value
but you can use sys_guid
if you prefer, or something else. You also don't need to many subqueries, you can do it with two; or as CTEs which I think it s a bit clearer:
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL
),
option_rows AS (
SELECT
id_no,
LEVEL answer,
REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+')
AND id_no = PRIOR id_no
AND PRIOR dbms_random.value IS NOT NULL
),
answer_rows AS (
SELECT
id_no,
REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+')
AND PRIOR id_no = id_no
AND PRIOR dbms_random.value IS NOT NULL
)
SELECT
option_rows.id_no,
LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers,
LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode
FROM option_rows
JOIN answer_rows
ON option_rows.id_no = answer_rows.id_no
AND option_rows.answer = answer_rows.answer
GROUP BY option_rows.id_no
ORDER BY option_rows.id_no;
Which gets:
ID_NO ANSWERS ANSWER_DECODE
---------- ---------- ----------------------------------------
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
I've also changed your regex pattern so you don't have to append or strip the |
.
Check out this compact solution:
with sample_data as
(
select 'ala|ma|kota' options, '1|2' answers from dual
union all
select 'apples|oranges|bacon', '1|2|3' from dual
union all
select 'a|b|c|d|e|f|h|i','1|3|4|5|8' from dual
)
select answers, options,
regexp_replace(regexp_replace(options,'([^|]+)\|([^|]+)\|([^|]+)','\' || replace(answers,'|','|\')),'[|]+','|') answer_decode
from sample_data;
Output:
ANSWERS OPTIONS ANSWER_DECODE
--------- -------------------- ---------------------------
1|2 ala|ma|kota ala|ma
1|2|3 apples|oranges|bacon apples|oranges|bacon
1|3|4|5|8 a|b|c|d|e|f|h|i a|c|d|f|h|i
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With