Is there a way to filter a field not containing something in a spark dataframe using scala?

Question

Hopefully I'm stupid and this will be easy.

I have a dataframe containing the columns 'url' and 'referrer'.

I want to extract all the referrers that contain the top level domain 'www.mydomain.com' and 'mydomain.co'.

I can use

val filteredDf = unfilteredDf.filter(($"referrer").contains("www.mydomain."))

However, this pulls out the url www.google.co.uk search url that also contains my web domain for some reason. Is there a way, using scala in spark, that I can filter out anything with google in it while keeping the correct results I have?

Thanks

Dean

zero323 · Accepted Answer

You can negate predicate using either not or ! so all what's left is to add another condition:

import org.apache.spark.sql.functions.not

df.where($"referrer".contains("www.mydomain.") &&
  not($"referrer".contains("google")))

or separate filter:

df
 .where($"referrer".contains("www.mydomain."))
 .where(!$"referrer".contains("google"))

Is there a way to filter a field not containing something in a spark dataframe using scala?

Tags:

scala

apache-spark

apache-spark-sql

Dean

1 Answers

zero323

Recent Activity

Donate For Us

Is there a way to filter a field not containing something in a spark dataframe using scala?

Tags:

scala

apache-spark

apache-spark-sql

Dean

1 Answers

zero323

Related questions

Recent Activity

Donate For Us