Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of regexp_replace vs translate in Oracle?

For simple things is it better to use the translate function on the premise that it is less CPU intensive or is regexp_replace the way to go?

This question comes forth from How can I replace brackets to hyphens within Oracle REGEXP_REPLACE function?

like image 443
Colin 't Hart Avatar asked Apr 17 '13 11:04

Colin 't Hart


People also ask

What is the difference between Translate and replace in Oracle?

REPLACE() replaces one string with another string. Therefore, if a string contains multiple characters, each character must be in the same order. TRANSLATE() on the other hand, replaces each character one by one, regardless of the order of those characters.

What is the use of REGEXP_REPLACE in Oracle?

The REGEXP_REPLACE function is used to return source_char with every occurrence of the regular expression pattern replaced with replace_string. The string returned is in the same character set as source_char. It returns VARCHAR2 if the first argument is not a LOB and returns CLOB if the first argument is a LOB.

What is the use of REGEXP_SUBSTR in Oracle?

REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR , but instead of returning the position of the substring, it returns the substring itself.


2 Answers

I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.

Here's my example, running on 11.1.0.7.0:

SQL> DECLARE
  2     TYPE t IS TABLE OF VARCHAR2(4000);
  3     l       t;
  4     l_level NUMBER := 1000;
  5     l_time  TIMESTAMP;
  6     l_char  VARCHAR2(4000);
  7  BEGIN
  8     -- init
  9     EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
 10     SELECT dbms_random.STRING('p', 2000)
 11       BULK COLLECT
 12       INTO l FROM dual
 13     CONNECT BY LEVEL <= l_level;
 14     -- regex
 15     l_time := systimestamp;
 16     FOR i IN 1 .. l.count LOOP
 17        l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
 18     END LOOP;
 19     dbms_output.put_line('regex     :' || (systimestamp - l_time));
 20     -- tranlate
 21     l_time := systimestamp;
 22     FOR i IN 1 .. l.count LOOP
 23        l_char := translate(l(i), '()[]', '----');
 24     END LOOP;
 25     dbms_output.put_line('translate :' || (systimestamp - l_time));
 26  END;
 27  /

regex     :+000000000 00:00:00.979305000
translate :+000000000 00:00:00.238773000

PL/SQL procedure successfully completed

on 11.2.0.3.0 :

regex     :+000000000 00:00:00.617290000
translate :+000000000 00:00:00.138205000

Conclusion: In general I suspect translate will win.

like image 165
Vincent Malgrat Avatar answered Oct 02 '22 11:10

Vincent Malgrat


For SQL, I tested this with the following script:

set timing on

select sum(length(x)) from (
  select translate('(<FIO>)', '()[]', '----') x
  from (
    select *
    from dual
    connect by level <= 2000000
  )
);

select sum(length(x)) from (
  select regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0) x
  from (
    select *
    from dual
    connect by level <= 2000000
  )
);

and found that the performance of translate and regexp_replace were almost always the same, but it could be that the cost of the other operations is overwhelming the cost of the functions I'm trying to test.

Next, I tried a PL/SQL version:

set timing on

declare
  x varchar2(100);
begin
  for i in 1..2500000 loop
    x := translate('(<FIO>)', '()[]', '----');
  end loop;
end;
/

declare
  x varchar2(100);
begin
  for i in 1..2500000 loop
    x := regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0);
  end loop;
end;
/

Here the translate version takes just under 10 seconds, while the regexp_replace version around 0.2 seconds -- around 2 orders of magnitude faster(!)

Based on this result, I will be using regular expressions much more often in my performance critical code -- both SQL and PL/SQL.

like image 31
Colin 't Hart Avatar answered Oct 02 '22 12:10

Colin 't Hart