Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting string into multiple rows in Oracle

People also ask

How do you split comma separated string and pass to in clause of select statement in Oracle?

1- Example -- ORACLE > 9. x Select Regexp_Substr('KING,JONES,FORD' ,'[^,]+' ,1 ,Level) Emp_Name From Dual Connect By Regexp_Substr('KING,JONES,FORD' ,'[^,]+' ,1 ,Level) Is Not Null; Find the employees named in a String separated by commas. Select * From Employee Emp Where Emp.

Is there a split function in Oracle?

Need Split function which will take two parameters, string to split and delimiter to split the string and return a table with columns Id and Data. And how to call Split function which will return a table with columns Id and Data. Id column will contain sequence and data column will contain data of the string.

How do I split a string in SQL?

The STRING_SPLIT(string, separator) function in SQL Server splits the string in the first argument by the separator in the second argument. To split a sentence into words, specify the sentence as the first argument of the STRING_SPLIT() function and ' ' as the second argument. FROM STRING_SPLIT( 'An example sentence.


This may be an improved way (also with regexp and connect by):

with temp as
(
    select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
    union all
    select 109, 'test2', 'Err1' from dual
)
select distinct
  t.name, t.project,
  trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))  as error
from 
  temp t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
order by name

EDIT: Here is a simple (as in, "not in depth") explanation of the query.

  1. length (regexp_replace(t.error, '[^,]+')) + 1 uses regexp_replace to erase anything that is not the delimiter (comma in this case) and length +1 to get how many elements (errors) are there.
  2. The select level from dual connect by level <= (...) uses a hierarchical query to create a column with an increasing number of matches found, from 1 to the total number of errors.

    Preview:

    select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1 as max 
    from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1
    
  3. table(cast(multiset(.....) as sys.OdciNumberList)) does some casting of oracle types.
    • The cast(multiset(.....)) as sys.OdciNumberList transforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList.
    • The table() function transforms a collection into a resultset.
  4. FROM without a join creates a cross join between your dataset and the multiset. As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value").

    Preview:

    select * from 
    temp t,
    table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
    
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) uses the column_value as the nth_appearance/ocurrence parameter for regexp_substr.
  6. You can add some other columns from your data set (t.name, t.project as an example) for easy visualization.

Some references to Oracle docs:

  • REGEXP_REPLACE
  • REGEXP_SUBSTR
  • Extensibility Constants, Types, and Mappings (OdciNumberList)
  • CAST (multiset)
  • Hierarchical Queries

regular expressions is a wonderful thing :)

with temp as  (
       select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
       union all
       select 109, 'test2', 'Err1' from dual
     )

SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
  FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name

There is a huge difference between the below two:

  • splitting a single delimited string
  • splitting delimited strings for multiple rows in a table.

If you do not restrict the rows, then the CONNECT BY clause would produce multiple rows and will not give the desired output.

  • For single delimited string, look at Split single comma delimited string into rows
  • For splitting delimited strings in a table, look at Split comma delimited strings in a table

Apart from Regular Expressions, a few other alternatives are using:

  • XMLTable
  • MODEL clause

Setup

SQL> CREATE TABLE t (
  2    ID          NUMBER GENERATED ALWAYS AS IDENTITY,
  3    text        VARCHAR2(100)
  4  );

Table created.

SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');

1 row created.

SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');

1 row created.

SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');

1 row created.

SQL> COMMIT;

Commit complete.

SQL>
SQL> SELECT * FROM t;

        ID TEXT
---------- ----------------------------------------------
         1 word1, word2, word3
         2 word4, word5, word6
         3 word7, word8, word9

SQL>

Using XMLTABLE:

SQL> SELECT id,
  2         trim(COLUMN_VALUE) text
  3  FROM t,
  4    xmltable(('"'
  5    || REPLACE(text, ',', '","')
  6    || '"'))
  7  /

        ID TEXT
---------- ------------------------
         1 word1
         1 word2
         1 word3
         2 word4
         2 word5
         2 word6
         3 word7
         3 word8
         3 word9

9 rows selected.

SQL>

Using MODEL clause:

SQL> WITH
  2  model_param AS
  3     (
  4            SELECT id,
  5                      text AS orig_str ,
  6                   ','
  7                          || text
  8                          || ','                                 AS mod_str ,
  9                   1                                             AS start_pos ,
 10                   Length(text)                                   AS end_pos ,
 11                   (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
 12                   0                                             AS element_no ,
 13                   ROWNUM                                        AS rn
 14            FROM   t )
 15     SELECT   id,
 16              trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
 17     FROM     (
 18                     SELECT *
 19                     FROM   model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
 20                     DIMENSION BY (element_no)
 21                     MEASURES (start_pos, end_pos, element_count)
 22                     RULES ITERATE (2000)
 23                     UNTIL (ITERATION_NUMBER+1 = element_count[0])
 24                     ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
 25                     end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
 26                 )
 27     WHERE    element_no != 0
 28     ORDER BY mod_str ,
 29           element_no
 30  /

        ID TEXT
---------- --------------------------------------------------
         1 word1
         1 word2
         1 word3
         2 word4
         2 word5
         2 word6
         3 word7
         3 word8
         3 word9

9 rows selected.

SQL>

A couple of more examples of the same:

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/

Also, may use DBMS_UTILITY.comma_to_table & table_to_comma: http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table


I would like to propose a different approach using a PIPELINED table function. It's somewhat similar to the technique of the XMLTABLE, except that you are providing your own custom function to split the character string:

-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/

-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
  p_string    VARCHAR2,
  p_delimiter CHAR DEFAULT ',' 
)
RETURN typ_str2tbl_nst PIPELINED
AS
  l_tmp VARCHAR2(32000) := p_string || p_delimiter;
  l_pos NUMBER;
BEGIN
  LOOP
    l_pos := INSTR( l_tmp, p_delimiter );
    EXIT WHEN NVL( l_pos, 0 ) = 0;
    PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
    l_tmp := SUBSTR( l_tmp, l_pos+1 );
  END LOOP;
END str2tbl;
/

-- The problem solution
SELECT name, 
       project, 
       TRIM(COLUMN_VALUE) error
  FROM t, TABLE(str2tbl(error));

Results:

      NAME PROJECT    ERROR
---------- ---------- --------------------
       108 test       Err1
       108 test       Err2
       108 test       Err3
       109 test2      Err1

The problem with this type of approach is that often the optimizer won't know the cardinality of the table function and it will have to make a guess. This could be potentialy harmful to your execution plans, so this solution can be extended to provide execution statistics for the optimizer.

You can see this optimizer estimate by running an EXPLAIN PLAN on the query above:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Even though the collection has only 3 values, the optimizer estimated 8168 rows for it (default value). This may seem irrelevant at first, but it may be enough for the optimizer to decide for a sub-optimal plan.

The solution is to use the optimizer extensions to provide statistics for the collection:

-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
  dummy NUMBER,

  STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
  RETURN NUMBER,

  STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
                                           p_stats     OUT SYS.ODCITabFuncStats,
                                           p_args      IN  SYS.ODCIArgDescList,
                                           p_string    IN  VARCHAR2,
                                           p_delimiter IN  CHAR DEFAULT ',' )
  RETURN NUMBER
);
/

-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
  STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
  RETURN NUMBER
  AS
  BEGIN
    p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
    RETURN ODCIConst.SUCCESS;
  END ODCIGetInterfaces;

  -- This function is responsible for returning the cardinality estimate
  STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
                                           p_stats     OUT SYS.ODCITabFuncStats,
                                           p_args      IN  SYS.ODCIArgDescList,
                                           p_string    IN  VARCHAR2,
                                           p_delimiter IN  CHAR DEFAULT ',' )
  RETURN NUMBER
  AS
  BEGIN
    -- I'm using basically half the string lenght as an estimator for its cardinality
    p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
    RETURN ODCIConst.SUCCESS;
  END ODCIStatsTableFunction;

END;
/

-- Associate our optimizer extension with the PIPELINED function   
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;

Testing the resulting execution plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |     1 |    23 |    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         |     1 |    23 |    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |     1 |     2 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

As you can see the cardinality on the plan above is not the 8196 guessed value anymore. It's still not correct because we are passing a column instead of a string literal to the function.

Some tweaking to the function code would be necessary to give a closer estimate in this particular case, but I think the overall concept is pretty much explained here.

The str2tbl function used in this answer was originally developed by Tom Kyte: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:110612348061

The concept of associating statistics with object types can be further explored by reading this article: http://www.oracle-developer.net/display.php?id=427

The technique described here works in 10g+.


Starting from Oracle 12c you could use JSON_TABLE and JSON_ARRAY:

CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION 
SELECT 109,'test2','Err1'             FROM dual;

And query:

SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
            FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
           '$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;

Output:

┌──────┬─────────┬──────────────────┬──────┐
│ Name │ Project │      Error       │  P   │
├──────┼─────────┼──────────────────┼──────┤
│  108 │ test    │ Err1, Err2, Err3 │ Err1 │
│  108 │ test    │ Err1, Err2, Err3 │ Err2 │
│  108 │ test    │ Err1, Err2, Err3 │ Err3 │
│  109 │ test2   │ Err1             │ Err1 │
└──────┴─────────┴──────────────────┴──────┘

db<>fiddle demo