Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting one row into multiple rows by breaking down the text after fixed length in oracle

I have a table which contains data something like this:

CREATE TABLE UDA_DATA 
( uda VARCHAR2(20), 
value_text VARCHAR2(4000) 
); 

Insert into UDA_DATA values('Material_ID','PBL000129 PBL000132 PBL000130 PBL000131 PBL000133'); 
Insert into UDA_DATA values('Material_ID','PBL0001341 PBL0001381 PBL0001351 PBL0001361 PBL0001371'); 
   commit;

Now if we select the data from this table it will give the result something like this:

select * from UDA_DATA;

It gives result something like this:

enter image description here

But however I am expecting something like this:

enter image description here

Means it should break the value_text into two or more rows if the character length is more than 30. Also, uda column should have the suffix as 1,2..n and it should not break the text in between.

Wrote a recurvise CTE to acheive the result:

with rcte (rn, uda, value, chunk_num, value_text) as (
  select rownum,
    uda,
    substr(value_text, 1, 30),
    1,
    substr(value_text, 31)
  from uda_data
  union all
  select rn,
    uda,
    substr(value_text, 1, 30),
    chunk_num + 1,
    substr(value_text, 31)
  from rcte
  where value_text is not null
)
select uda || chunk_num as uda, value
from rcte
order by rn, chunk_num;

Which is giving the result like this:

enter image description here

In third row, it's breaking the text in between which is not correct, want a result something this:

enter image description here

Anyhelp will be highly appreciated.

like image 521
Shashank Jha Avatar asked Dec 28 '25 04:12

Shashank Jha


2 Answers

Here's one way:

  • Split the strings into words
  • Recombine the words up to a length of 30 characters

The rows have the same values for uda, so I've also given these each a row_number first so you can tell them apart.

I've used match_recogonize to group together the rows up to thirty characters long. Then listagg to recombine the words in each group.

You could adapt recursive with to do this too

Which gives:

with ranks as (
  select u.*,
         row_number () over (
           order by uda, value_text
         ) rk
  from   uda_data u
), rws as (
  select rk, uda, rn,
         regexp_substr ( value_text, '[^ ]+', 1, rn ) || ' ' str
  from   ranks, lateral (
    select level rn from dual
    connect by level <= regexp_count ( value_text, ' ' ) + 1
  ) 
), grps as (
  select *
  from   rws
    match_recognize (
      partition by rk
      order by rn
      measures
        match_number() as grp,
        sum ( length ( str ) ) as len
      all rows per match 
      pattern ( thirty+ )
      define 
        thirty as sum ( length ( str ) ) <= 30
    )
)
  select uda || grp, 
         listagg ( str ) 
           within group ( order by rn ) strs 
  from   grps
  group by rk, uda || grp;
  
UDA||GRP        STRS                             
Material_ID1    PBL000129 PBL000132 PBL000130     
Material_ID2    PBL000131 PBL000133               
Material_ID1    PBL0001341 PBL0001381             
Material_ID2    PBL0001351 PBL0001361             
Material_ID3    PBL0001371 

Note: the splitting-recombining trick increases the number of rows you process. If the input strings will be long, this method could be very slow. If the majority will split into 2-3 groups, this is probably OK - though obviously test on your data!

like image 105
Chris Saxon Avatar answered Dec 31 '25 00:12

Chris Saxon


The problem with your query is that while you want to split the string on blanks, you are not even trying to do that. You are using substr(value_text, 1, 30) instead, which doesn't care for where the blanks are located in the string.

You can use INSTR to find the split position. Below query works fine, as long as there is no code inside that is longer than 30 characters (in which case INSTR returns -1 which leads to a cycle in the recursive query). You may want to adjust the query for this case.

with rcte (rn, uda, value, chunk_num, value_text) as (
  select rownum,
    uda,
    substr(trim(value_text), 1, instr(substr(value_text || ' ', 1, 31), ' ', -1) - 1),
    1,
    trim(substr(trim(value_text), instr(substr(value_text || ' ', 1, 31), ' ', -1) + 1))
  from uda_data
  union all
  select rn,
    uda,
    substr(value_text, 1, instr(substr(value_text || ' ', 1, 31), ' ', -1) - 1),
    chunk_num + 1,
    trim(substr(value_text, instr(substr(value_text || ' ', 1, 31), ' ', -1) + 1))
  from rcte
  where value_text is not null
)
select uda || rn || '/' || chunk_num as uda, value
from rcte
order by rn, chunk_num;

Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=8dfc8e55a12c4666b4bc7bfcaceea2d2

like image 27
Thorsten Kettner Avatar answered Dec 31 '25 00:12

Thorsten Kettner