What is the difference between translate
and regexp_replace
function in Spark SQL.
I want to replace substrings in the strings, whole integer values and other data types like boolean.
For example.
func("hello","e","a") = "hallo"
func(true,true ,false) = false
func(112,112,9) = 9
func(112,115,9) = 112
Which one should I use and what advantages/disadvantages each one have?
There are simply not equivalent:
translate
is used to literally translate one character table to another character table. It doesn't care about the context, it doesn't use regular expressions, it only considers the character at hand. From the examples you've provided the only case where it is applicable is a single letter substitution:
spark.sql("SELECT TRANSLATE('hello', 'e', 'a')").show()
+----------------------+
|translate(hello, e, a)|
+----------------------+
| hallo|
+----------------------+
In general translate
is useful to handle invalid characters and other simple cleanup tasks. It is simple to write and has little runtime overhead:
spark.sql("SELECT TRANSLATE('ed-ba', 'abcde', '12345')").show()
+------------------------------+
|translate(ed-ba, abcde, 12345)|
+------------------------------+
| 54-21|
+------------------------------+
regexp_replace
. This is exactly what it says. You get full Java regexp machinery at your disposal. If you want to replace substrings in the strings this is the one you're looking for.
None is applicable for replacing whole integer values and other datatypes like boolean. For this use CASE ... WHEN ... OTHERWISE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With