Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL - Regex for matching only numbers

I am trying to make sure that a particular column in a dataframe does not contain any illegal values (non- numerical data). For this purpose I am trying to use a regex matching using rlike to collect illegal values in the data:

enter image description here

I need to collect the values with string characters or spaces or commas or any other characters that are not like numbers. I tried:

spark.sql("select * from tabl where UPC not rlike '[0-9]*'").show()

but this doesn't work. it produces 0 rows.

Any help is appreciated. Thank you.

like image 931
Hemanth Avatar asked Feb 10 '20 10:02

Hemanth


People also ask

What is Rlike in Spark SQL?

Spark SQL rlike() Function Similar to SQL regexp_like(), Spark SQL have rlike() that takes regular expression (regex) as input and matches the input column value with the regular expression.

Is numeric function in Spark SQL?

Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write a user defined function using your favorite programming language and register it in Spark or use alternative SQL option to check numeric values.

How do you check alphanumeric in Pyspark?

Check whether all characters in each string are alphanumeric. This is equivalent to running the Python string method str. isalnum() for each element of the Series/Index.


1 Answers

rlike is looking for any match within the string. The asterisk (*) means 0 or many. Having zero numbers somewhere in a string applies to every possible string. You need to specify that you want to match from beginning ^ til the end of string $

spark.sql("select * from tabl where UPC not rlike '^[0-9]*$'").show()

alternatively you can also match for any single non numeric character within the string [^0-9]

spark.sql("select * from tabl where UPC rlike '[^0-9]'").show()
like image 165
dre-hh Avatar answered Sep 19 '22 01:09

dre-hh