I have a string column description
in a hive table which may contain tab characters '\t'
, these characters are however messing some views when connecting hive to an external application. is there a simple way to get rid of all tab characters in that column?. I could run a simple python program to do it, but I want to find a better solution for this.
Use the powerful regexp_replace function to replace characters.
Use nvl() function in Hive to replace all NULL values of a column with a default value, In this article, I will explain with an example. Replace all NULL values with -1 or 0 or any number for the integer column. Replace all NULL values with empty space for string types. Replace with any value based on your need.
Replace the multiple string with another string We can mention the multiple existing strings with the pipe line (|) character in the regexp_replace function and the those strings will be replaced with the given new string value.
regexp_replace
UDF performs my task. Below is the definition and usage from apache Wiki.
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT):
This returns the string resulting from replacing all substrings in INITIAL_STRING
that match the java regular expression syntax defined in PATTERN
with instances of REPLACEMENT
,
e.g.: regexp_replace("foobar", "oo|ar", "")
returns fb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With