Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between translate and regexp_replace

What is the difference between translate and regexp_replace function in Spark SQL.

I want to replace substrings in the strings, whole integer values and other data types like boolean.

For example.

func("hello","e","a") = "hallo"
func(true,true ,false) = false
func(112,112,9) = 9
func(112,115,9) = 112

Which one should I use and what advantages/disadvantages each one have?

like image 659
Deepak Punjabi Avatar asked Dec 23 '22 21:12

Deepak Punjabi


1 Answers

There are simply not equivalent:

  • translate is used to literally translate one character table to another character table. It doesn't care about the context, it doesn't use regular expressions, it only considers the character at hand. From the examples you've provided the only case where it is applicable is a single letter substitution:

    spark.sql("SELECT TRANSLATE('hello', 'e', 'a')").show()
    
    +----------------------+
    |translate(hello, e, a)|
    +----------------------+
    |                 hallo|
    +----------------------+
    

    In general translate is useful to handle invalid characters and other simple cleanup tasks. It is simple to write and has little runtime overhead:

    spark.sql("SELECT TRANSLATE('ed-ba', 'abcde', '12345')").show()
    
    +------------------------------+
    |translate(ed-ba, abcde, 12345)|
    +------------------------------+
    |                         54-21|
    +------------------------------+
    
  • regexp_replace. This is exactly what it says. You get full Java regexp machinery at your disposal. If you want to replace substrings in the strings this is the one you're looking for.

  • None is applicable for replacing whole integer values and other datatypes like boolean. For this use CASE ... WHEN ... OTHERWISE

like image 125
zero323 Avatar answered Apr 08 '23 02:04

zero323