Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String sequential grouping in Oracle

My task is to group similar digits/characters in a given string, For e.g.: The output of the SQL for the string 4455599 should be 44 555 99 and is working with following query:

with t(str)
as (
  select '4455599' from dual
)
select listagg(str_grouped ,' ') within group (order by rownum) str_split
from(
    select listagg ( str) within group ( order by lvl) str_grouped
    from(
        select level lvl, 
               substr(str,level,1) str, 
               dense_rank() over( order by substr(str,level,1)) drank_no
        from t
          connect by level <= length(str)
    )
 group by drank_no
  );

But the query fails with the following data as I am currently using dense_rank.

445559944, expected 44 555 99 44 but getting 4444 555 99.

bb119911, expected bb 11 99 11 but getting 1111 99 bb.

Help me on this, welcome all regular expression query.

like image 760
ajmalmhd04 Avatar asked May 19 '15 05:05

ajmalmhd04


1 Answers

Backreferences to the rescue:

select 
    regexp_replace('4455599', '((.)\2*)', '\1 ')
from dual;

Output:

44 555 99 

Explanation

((.)\2*) defines two capturing groups where:

(.) matches any single character and captures it in group 2.

\2* is a backreference to the character captured in group 2, which matches the same character zero or more times.

((.)\2*) therefore matches a sequence of one-or-more of the same character and captures the sequence in group 1.

\1 replaces the characters matched with the contents of group 1, followed by a space.

Backreferences are counted from left-to-right starting with 1 (group 0 is the entire match). So if you have the pattern (((a)b)c)d the innermost (a) is group 3, ((a)b) is group 2, (((a)b)c) is group 1, and if you're using a normal regex engine (not oracle's), the entire pattern (((a)b)c)d is captured in group 0.

Test cases

select 
    val, regexp_replace(val, '((.)\2*)', '\1 ') as result
from (
    select '445559944' as val from dual
    union all select 'bb119911' as val from dual
    union all select '46455599464' as val from dual
) foo;

Output:

VAL         RESULT
----------- ------------------
445559944   44 555 99 44 
bb119911    bb 11 99 11 
46455599464 4 6 4 555 99 4 6 4 
like image 60
beerbajay Avatar answered Oct 08 '22 04:10

beerbajay