Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snowflake: Replace multiple patterns using single Regex_replace

Requirement: Want to achieve all below into one regex_replace

  1. Replace comma, tab and new line character to empty
  2. Replace double quotes into a single quote
  3. Replace pipe into a dash

Also would like to know if replace performance is faster than regex_replace

Below is what I tried out

    with c as (select ' abc \n def\n' as t
              union 
              select 'abc\tdef' as t)          
    select t,regexp_replace(t, '[,\t\n]','') from c;
   
    with c as (select ' abc  de"f\n' as t
              union
              select 'abc\td|ef' as t)
    select t,replace(replace(replace(t, '\n',' '),'"','\''),'|','-') from c;
like image 821
Kar Avatar asked Apr 12 '26 17:04

Kar


1 Answers

The regexp_replace function will only match one pattern and replace it with a literal value, so you can't make it selectively replace depending on what it matches. You can write a UDF to do that in one step, which will make your SQL a lot neater, particularly if you have to put it in a lot of statements. This one does all three replacements:

create or replace function CLEANUP_STRING(STR string)
returns string
language javascript as
$$
    var s = STR.replace(/[,\t\n]/g, "");  // Replace comma, tab and new line character to empty
    s = s.replace(/"/g, "'");             // Replace double quote with single quote
    s = s.replace(/\|/g, "-");            // Replace pipe with dash
    return s;
$$;

-- This will display in black font, showing that the line break is gone.
select cleanup_string('This is a "double quoted string". This has a | pipe. This has a new \n line. This has a, comma.');
like image 71
Greg Pavlik Avatar answered Apr 15 '26 07:04

Greg Pavlik