Can we check to see if every column in a spark dataframe contains a certain string(example "Y") using Spark-SQL or scala?
I have tried the following but don't think it is working properly.
df.select(df.col("*")).filter("'*' =='Y'")
Thanks, Sai
You can do something like this to keep the rows where all columns contain 'Y':
//Get all columns
val columns: Array[String] = df.columns
//For each column, keep the rows with 'Y'
val seqDfs: Seq[DataFrame] = columns.map(name => df.filter(s"$name == 'Y'"))
//Union all the dataframes together into one final dataframe
val output: DataFrame = seqDfs.reduceRight(_ union _)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With