I want to filter out rows in Spark DataFrame
that have Email column that look like real, here's what I tried:
df.filter($"Email" match {case ".*@.*".r => true case _ => false})
But this doesn't work. What is the right way to do it?
To expand on @TomTom101's comment, the code you're looking for is:
df.filter($"Email" rlike ".*@.*")
The primary reason why the match
doesn't work is because DataFrame
has two filter functions which take either a String or a Column. This is unlike RDD
with one filter that takes a function from T
to Boolean.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With