Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Remove non-alpha numeric, or non-numeric characters with Hive REGEXP_EXTRACT() Function

Tags:

People also ask

How do you remove non-alphanumeric characters?

To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced.

How do I remove non numeric characters from a string?

In order to remove all non-numeric characters from a string, replace() function is used. replace() Function: This function searches a string for a specific value, or a RegExp, and returns a new string where the replacement is done.

How do you replace non-alphanumeric characters with empty strings?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.

How do I strip non-alphanumeric characters in C#?

Using Regular Expression We can use the regular expression [^a-zA-Z0-9] to identify non-alphanumeric characters in a string. Replace the regular expression [^a-zA-Z0-9] with [^a-zA-Z0-9 _] to allow spaces and underscore character.


I've been trying to figure out how to remove multiple non-alphanumeric or non-numeric characters, or return only the numeric characters from a string. I've tried:

SELECT
regexp_extract('X789', '[0-9]', 0)
FROM
table_name

But it returns '7', not '789'.

I've also tried to remove non-numeric characters using NOT MATCH syntax ^((?!regexp).)*$:

SELECT
REGEXP_REPLACE('X789', '^((?![0-9]).)*$', '')
FROM
jav_test_ii

Can regexp_extract return multiple matches? What I'm really trying to do is clean my data to only contain numbers, or alphanumeric characters. This seems to help remove bad characters, but its not a range of characters like [0-9] is. regexp_replace(string, '�','')

EDIT: The query below was able to return '7789', which is exactly what I was looking for.

SELECT
regexp_replace("7X789", "[^0-9]+", "")
FROM
table_name