Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter DataFrame with Regex with Spark in Scala

I want to filter out rows in Spark DataFrame that have Email column that look like real, here's what I tried:

df.filter($"Email" match {case ".*@.*".r => true case _ => false})

But this doesn't work. What is the right way to do it?

like image 953
Bamqf Avatar asked Nov 27 '15 21:11

Bamqf


1 Answers

To expand on @TomTom101's comment, the code you're looking for is:

df.filter($"Email" rlike ".*@.*")

The primary reason why the match doesn't work is because DataFrame has two filter functions which take either a String or a Column. This is unlike RDD with one filter that takes a function from T to Boolean.

like image 60
Matthew Graves Avatar answered Nov 07 '22 02:11

Matthew Graves