Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Regex - Replace with substring from another field

I have a database table (Oracle 11g) of questionnaire feedback, including multiple choice, multiple answer questions. The Options column has each value the user could choose, and the Answers column has the numerical values of what they chose.

ID_NO     OPTIONS                               ANSWERS
1001      Apple Pie|Banana-Split|Cream Tea      1|2
1002      Apple Pie|Banana-Split|Cream Tea      2|3
1003      Apple Pie|Banana-Split|Cream Tea      1|2|3

I need a query that will decode the answers, with the text versions of the answers as a single string.

ID_NO     ANSWERS     ANSWER_DECODE
1001      1|2         Apple Pie|Banana-Split
1002      2|3         Banana-Split|Cream Tea
1003      1|2|3       Apple Pie|Banana-Split|Cream Tea

I have experimented with regular expressions to replace values and get substrings, but I can't work out a way to properly merge the two.

WITH feedback AS (
  SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
  SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
  SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT 
  id_no,
  options,
  REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option,
  answers,
  REGEXP_REPLACE(answers, '(\d)+', ' \1 ') answer_numbers,
  REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???"
FROM feedback

I don't want to have to manually define or decode the answers in the SQL; there are many surveys with different questions (and differing numbers of options), so I'm hoping that there's a solution that will dynamically work for all of them.

I've tried to split the options and answers into separate rows by LEVEL, and re-join them where the codes match, but this runs exceedingly slow with the actual dataset (a 5-option question with 600 rows of responses).

WITH feedback AS (
  SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
  SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
  SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
    answer_rows.id_no,
    ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl)
FROM
  (SELECT DISTINCT
    LEVEL lvl,
    REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer
  FROM
    (SELECT DISTINCT
      options,
      REGEXP_COUNT(options||'|', '(.)+?\|') num_choices
    FROM
      feedback)
  CONNECT BY LEVEL <= num_choices
  ) option_rows
  LEFT OUTER JOIN
  (SELECT DISTINCT
    id_no,
    to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer
  FROM
    (SELECT DISTINCT
      id_no,
      answers,
      To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer
    FROM
      feedback)
  WHERE
    to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL
  CONNECT BY LEVEL <= max_answer
  ) answer_rows
    ON option_rows.lvl = answer_rows.answer
GROUP BY
    answer_rows.id_no
ORDER BY
  answer_rows.id_no

If there isn't a solution just using Regex, is there a more efficient way than LEVEL to split the values? Or is there another approach that would work?

like image 967
Pete Rennard-Cumming Avatar asked Jan 25 '16 16:01

Pete Rennard-Cumming


2 Answers

It's slow because you're expanding each row too many times; the connect-by clauses you're using are looking across all the rows, so you're ending up with a huge amount of data to then sort - which is presumably why you ended up with the DISTINCT in there.

You can add two PRIOR clauses to the connect-by, firstly so the ID_NO is preserved, and a second to avoid a loop - any non-deterministic function will do for this, I've picked dbms_random.value but you can use sys_guid if you prefer, or something else. You also don't need to many subqueries, you can do it with two; or as CTEs which I think it s a bit clearer:

WITH feedback AS (
  SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
  SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
  SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL
),
option_rows AS (
  SELECT
    id_no,
    LEVEL answer,
    REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text
  FROM feedback
  CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+')
  AND id_no = PRIOR id_no
  AND PRIOR dbms_random.value IS NOT NULL
),
answer_rows AS (
  SELECT
    id_no,
    REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer
  FROM feedback
  CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+')
  AND PRIOR id_no = id_no
  AND PRIOR dbms_random.value IS NOT NULL
)
SELECT
  option_rows.id_no,
  LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers,
  LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode
FROM option_rows
JOIN answer_rows
ON option_rows.id_no = answer_rows.id_no
AND option_rows.answer = answer_rows.answer
GROUP BY option_rows.id_no
ORDER BY option_rows.id_no;

Which gets:

     ID_NO ANSWERS    ANSWER_DECODE                          
---------- ---------- ----------------------------------------
      1001 1|2        Apple Pie|Banana-Split                  
      1002 2|3        Banana-Split|Cream Tea                  
      1003 1|2|3      Apple Pie|Banana-Split|Cream Tea  

I've also changed your regex pattern so you don't have to append or strip the |.

like image 56
Alex Poole Avatar answered Sep 30 '22 18:09

Alex Poole


Check out this compact solution:

   with sample_data as
(
  select 'ala|ma|kota' options, '1|2' answers from dual
  union all
  select 'apples|oranges|bacon', '1|2|3' from dual
  union all
  select 'a|b|c|d|e|f|h|i','1|3|4|5|8' from dual
)
select answers, options,
regexp_replace(regexp_replace(options,'([^|]+)\|([^|]+)\|([^|]+)','\' || replace(answers,'|','|\')),'[|]+','|') answer_decode
from sample_data;

Output:

  ANSWERS   OPTIONS              ANSWER_DECODE
--------- -------------------- ---------------------------
1|2       ala|ma|kota          ala|ma
1|2|3     apples|oranges|bacon apples|oranges|bacon
1|3|4|5|8 a|b|c|d|e|f|h|i      a|c|d|f|h|i
like image 1
Piotr Siekierski Avatar answered Sep 30 '22 17:09

Piotr Siekierski