Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check every column in a spark dataframe has a certain value

Can we check to see if every column in a spark dataframe contains a certain string(example "Y") using Spark-SQL or scala?

I have tried the following but don't think it is working properly.

df.select(df.col("*")).filter("'*' =='Y'")

Thanks, Sai

like image 623
Bharath Avatar asked Nov 04 '25 20:11

Bharath


1 Answers

You can do something like this to keep the rows where all columns contain 'Y':

//Get all columns
val columns: Array[String] = df.columns

//For each column, keep the rows with 'Y'
val seqDfs: Seq[DataFrame] = columns.map(name => df.filter(s"$name == 'Y'"))

//Union all the dataframes together into one final dataframe
val output: DataFrame = seqDfs.reduceRight(_ union _)
like image 64
Sohum Sachdev Avatar answered Nov 07 '25 09:11

Sohum Sachdev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!