I have more than 100 columns in a dataframe. Out of 100 columns, 90 are metric columns. I need to find rows that has atleast one of the metric is not 0. I am filtering something like metric1 <> 0 or metric2 <> 0 and so on.. is there any trick to handle the situation better ?
Here are some more options, all presuming that the target columns have names such as metric1, metric2, metric3 ... metricN.
First let's identify the target columns:
val targetColumns = df.columns.filter(_.matches("metric\d+"))
Option1: Filter using greatest which will return the column with the larger value:
import org.apache.spark.sql.functions.greatest
df.filter(greatest(targetColumns:_*) != 0)
Option2: Applying bitwise OR between columns:
import org.apache.spark.sql.functions.col
val bitwiseORCols = targetColumns.map(col).reduce(_ bitwiseOR _)
df.filter(bitwiseORCols != 0)
You can make an array column from your metrics columns and use an udf to check exists non zero values in that array column you created.
scala> df.show
+---+-----+-------+-------+-------+
| id| name|metric1|metric2|metric3|
+---+-----+-------+-------+-------+
| 1|name1| 3| 0| 0|
| 2|name2| 0| 0| 0|
| 3|name3| 0| 3| 3|
| 4|name4| 0| 0| 0|
+---+-----+-------+-------+-------+
scala> def arrayNotAllZeros[T](a: Seq[T]):Boolean = {
| a.exists(_ != 0)
| }
arrayNotAllZeros: [T](a: Seq[T])Boolean
scala>
scala> val myUdf = udf { arrayNotAllZeros[Int] _ }
myUdf: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,BooleanType,Some(List(ArrayType(IntegerType,false))))
scala>
scala> val metricCols = df.columns.takeRight(3)
metricCols: Array[String] = Array(metric1, metric2, metric3)
scala> df.withColumn("nonZeroRow", myUdf(array(metricCols.head, metricCols.tail:_*))).show
+---+-----+-------+-------+-------+----------+
| id| name|metric1|metric2|metric3|nonZeroRow|
+---+-----+-------+-------+-------+----------+
| 1|name1| 3| 0| 0| true|
| 2|name2| 0| 0| 0| false|
| 3|name3| 0| 3| 3| true|
| 4|name4| 0| 0| 0| false|
+---+-----+-------+-------+-------+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With