I have a PySpark Dataframe
with a column of strings
. How can I check which rows in it are Numeric. I could not find any function in PySpark's official documentation.
values = [('25q36',),('75647',),('13864',),('8758K',),('07645',)]
df = sqlContext.createDataFrame(values,['ID',])
df.show()
+-----+
| ID|
+-----+
|25q36|
|75647|
|13864|
|8758K|
|07645|
+-----+
In Python, there is a function .isDigit()
which returns True
or False
if the string
contains just numbers or not.
Expected DataFrame:
+-----+-------+
| ID| Value |
+-----+-------+
|25q36| False |
|75647| True |
|13864| True |
|8758K| False |
|07645| True |
+-----+-------+
I would like to avoid creating a UDF
.
Indeed I enjoyed the creative solution provided by Steven but here is my much easier suggestion for this kind of situation:
df.filter(~df.ID.rlike('\D+')).show()
Firstly, you select every row which contains a non-digits character with rlike('\D+')
and then excluding those rows with ~
at the beginning of the filter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With