Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Detect any substring of the alphabet?

Tags:

regex

oracle

First step

  • With "the alphabet" defined as ABCDEFGHIJKLMNOPQRSTUVWXYZ, I want to find any substring of the alphabet. I need to build more from here, but this is my first challenge.

End-goal

  • Given a pattern of characters (A-Z) with no repetitions and no whitespace and only incrementing characters (ABDE, never ABED), replace all missing characters in the alphabet with a single space within an Oracle statement. So, a column of a row might read ABCDEGHIJKLMNOPQTUVWXYZ (F and RS are missing) and it needs to read ABCDE GHIJKLMNOPQ TUVWXYZ.

Is this even possible?

David

like image 717
DavidZemon Avatar asked Jan 09 '23 18:01

DavidZemon


1 Answers

For a single value you can use two connect-by clauses; one to generate 26 values, the other to split the original string into individual characters. Since the ASCII codes are consecutive, the ascii() function can be used to generate a number from 1-26 for each of the characters that are present. Then left-join the two lists:

var str varchar2(26);
exec :str := 'ABCDFGZ';

with alphabet as (
  select level as pos
  from dual connect by level <= 26
),
chars as (
  select substr(:str, level, 1) character,
    ascii(substr(:str, level, 1)) - 64 as pos
  from dual connect by level <= length(:str)
)
select listagg(nvl(chars.character, ' '))
  within group (order by alphabet.pos) as result
from alphabet
left outer join chars on chars.pos = alphabet.pos;

RESULT                   
--------------------------
ABCD FG                  Z 

This is with an SQL*Plus bind variable to avoid repeating the string, but it could be plugged in from somewhere else.

It's a bit more complicated to create a view as the multiple rows from the table can cause problems with the connect-by. The list of possible values has to include a primary (or unique, at least) key from the table, and the original string if you want to include that (and any other columns you want from the table). The split list needs to also include the primary key, and that needs to be included in the outer join.

create view v42 as
with possible as (
  select id, str, level as pos
  from t42
  connect by level <= 26
  and prior id = id
  and prior sys_guid() is not null
),
actual as (
  select id, substr(str, level, 1) character,
    ascii(substr(str, level, 1)) - 64 as pos
  from t42
  connect by level <= length(str)
  and prior id = id
  and prior sys_guid() is not null
)
select possible.id, possible.str, listagg(nvl(actual.character, ' '))
  within group (order by possible.pos) as result
from possible
left outer join actual on actual.id = possible.id and actual.pos = possible.pos
group by possible.id, possible.str;

Then with some sample data, select * from v42 gives:

        ID STR                        RESULT                   
---------- -------------------------- --------------------------
         1 A                          A                          
         2 Z                                                   Z 
         3 AZ                         A                        Z 
         4 ABCDFGZ                    ABCD FG                  Z 
         5 ABCDEGHIJKLMNOPQTUVWXYZ    ABCDE GHIJKLMNOPQ  TUVWXYZ 

SQL Fiddle demo.

It might be a little cleaner with a recursive CTE instead. Or with a function that works on one value at a time. Or with a regex, of course...

Here's a recursive CTE version, for fun:

create view v42 as
with possible(id, str, pos, character) as (
  select id, str, 1, 'A'
  from t42
  union all
  select id, str, pos + 1, chr(64 + pos + 1)
  from possible
  where pos < 26
),
actual (id, str, pos, character) as (
  select id, str, 1, substr(str, 1, 1)
  from t42
  union all
  select id, str, pos + 1, substr(str, pos + 1, 1)
  from actual
  where pos < length(str)
)
select possible.id, possible.str, listagg(nvl(actual.character, ' '))
  within group (order by possible.pos) as result
from possible
left outer join actual
on actual.id = possible.id
and actual.character = possible.character
group by possible.id, possible.str;

(SQL Fiddle does odd things with spacing, so view as plaintext output, from the 'Run SQL' drop-down.)

like image 198
Alex Poole Avatar answered Jan 17 '23 23:01

Alex Poole