Is it possible to evaluate formulas in a dataframe which refer to columns? e.g. if I have data like this (Scala example):
val df = Seq(
( 1, "(a+b)/d", 1, 20, 2, 3, 1 ),
( 2, "(c+b)*(a+e)", 0, 1, 2, 3, 4 ),
( 3, "a*(d+e+c)", 7, 10, 6, 2, 1 )
)
.toDF( "Id", "formula", "a", "b", "c", "d", "e" )
df.show()
Expected results:
I have been unable to get selectExpr
, expr
, eval()
or combinations of them to work.
You can use the scala toolbox eval in a UDF:
import org.apache.spark.sql.functions.col
import scala.reflect.runtime.universe
import scala.tools.reflect.ToolBox
val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox()
val cols = df.columns.tail
val eval_udf = udf(
(r: Seq[String]) =>
tb.eval(tb.parse(
("val %s = %s;" * cols.tail.size).format(
cols.tail.zip(r.tail).flatMap(x => List(x._1, x._2)): _*
) + r(0)
)).toString
)
val df2 = df.select(col("id"), eval_udf(array(df.columns.tail.map(col):_*)).as("result"))
df2.show
+---+------+
| id|result|
+---+------+
| 1| 7|
| 2| 12|
| 3| 63|
+---+------+
A slightly different version of mck's answer, by replacing the variables in the formula
column by their corresponding values from the other columns then calling eval udf :
import scala.reflect.runtime.currentMirror
import scala.tools.reflect.ToolBox
val eval = udf((f: String) => {
val toolbox = currentMirror.mkToolBox()
toolbox.eval(toolbox.parse(f)).toString
})
val formulaExpr = expr(df.columns.drop(2).foldLeft("formula")((acc, c) => s"replace($acc, '$c', $c)"))
df.select($"Id", eval(formulaExpr).as("result")).show()
//+---+------+
//| Id|result|
//+---+------+
//| 1| 7|
//| 2| 12|
//| 3| 63|
//+---+------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With