I have a table which contains data something like this:
CREATE TABLE UDA_DATA
( uda VARCHAR2(20),
value_text VARCHAR2(4000)
);
Insert into UDA_DATA values('Material_ID','PBL000129 PBL000132 PBL000130 PBL000131 PBL000133');
Insert into UDA_DATA values('Material_ID','PBL0001341 PBL0001381 PBL0001351 PBL0001361 PBL0001371');
commit;
Now if we select the data from this table it will give the result something like this:
select * from UDA_DATA;
It gives result something like this:

But however I am expecting something like this:

Means it should break the value_text into two or more rows if the character length is more than 30. Also, uda column should have the suffix as 1,2..n and it should not break the text in between.
Wrote a recurvise CTE to acheive the result:
with rcte (rn, uda, value, chunk_num, value_text) as (
select rownum,
uda,
substr(value_text, 1, 30),
1,
substr(value_text, 31)
from uda_data
union all
select rn,
uda,
substr(value_text, 1, 30),
chunk_num + 1,
substr(value_text, 31)
from rcte
where value_text is not null
)
select uda || chunk_num as uda, value
from rcte
order by rn, chunk_num;
Which is giving the result like this:

In third row, it's breaking the text in between which is not correct, want a result something this:

Anyhelp will be highly appreciated.
Here's one way:
The rows have the same values for uda, so I've also given these each a row_number first so you can tell them apart.
I've used match_recogonize to group together the rows up to thirty characters long. Then listagg to recombine the words in each group.
You could adapt recursive with to do this too
Which gives:
with ranks as (
select u.*,
row_number () over (
order by uda, value_text
) rk
from uda_data u
), rws as (
select rk, uda, rn,
regexp_substr ( value_text, '[^ ]+', 1, rn ) || ' ' str
from ranks, lateral (
select level rn from dual
connect by level <= regexp_count ( value_text, ' ' ) + 1
)
), grps as (
select *
from rws
match_recognize (
partition by rk
order by rn
measures
match_number() as grp,
sum ( length ( str ) ) as len
all rows per match
pattern ( thirty+ )
define
thirty as sum ( length ( str ) ) <= 30
)
)
select uda || grp,
listagg ( str )
within group ( order by rn ) strs
from grps
group by rk, uda || grp;
UDA||GRP STRS
Material_ID1 PBL000129 PBL000132 PBL000130
Material_ID2 PBL000131 PBL000133
Material_ID1 PBL0001341 PBL0001381
Material_ID2 PBL0001351 PBL0001361
Material_ID3 PBL0001371
Note: the splitting-recombining trick increases the number of rows you process. If the input strings will be long, this method could be very slow. If the majority will split into 2-3 groups, this is probably OK - though obviously test on your data!
The problem with your query is that while you want to split the string on blanks, you are not even trying to do that. You are using substr(value_text, 1, 30) instead, which doesn't care for where the blanks are located in the string.
You can use INSTR to find the split position. Below query works fine, as long as there is no code inside that is longer than 30 characters (in which case INSTR returns -1 which leads to a cycle in the recursive query). You may want to adjust the query for this case.
with rcte (rn, uda, value, chunk_num, value_text) as (
select rownum,
uda,
substr(trim(value_text), 1, instr(substr(value_text || ' ', 1, 31), ' ', -1) - 1),
1,
trim(substr(trim(value_text), instr(substr(value_text || ' ', 1, 31), ' ', -1) + 1))
from uda_data
union all
select rn,
uda,
substr(value_text, 1, instr(substr(value_text || ' ', 1, 31), ' ', -1) - 1),
chunk_num + 1,
trim(substr(value_text, instr(substr(value_text || ' ', 1, 31), ' ', -1) + 1))
from rcte
where value_text is not null
)
select uda || rn || '/' || chunk_num as uda, value
from rcte
order by rn, chunk_num;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=8dfc8e55a12c4666b4bc7bfcaceea2d2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With