I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods.
My current approach is to create a class (say MyClass
) with DataFrame
as parameter, define my custom method (say customMethod
) in that and define an implicit method which converts DataFrame
to MyClass
.
implicit def dataFrametoMyClass(df: DataFrame): MyClass = new MyClass(df)
Thus I can call:
dataFrame.customMethod()
Is this the correct way to do it? Open for suggestions.
Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:
object ExtraDataFrameOperations {
object implicits {
implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
}
}
case class DFWithExtraOperations(df: DataFrame) {
def customMethod(param: String) : DataFrame = {
// do something fancy with the df
// or delegate to some implementation
//
// here, just as an illustrating example: do a select
df.select( df(param) )
}
}
To use the new customMethod
method on a DataFrame
:
import ExtraDataFrameOperations.implicits._
val df = ...
val otherDF = df.customMethod("hello")
Instead of using an implicit method
(see above), you can also use an implicit class
:
object ExtraDataFrameOperations {
implicit class DFWithExtraOperations(df : DataFrame) {
def customMethod(param: String) : DataFrame = {
// do something fancy with the df
// or delegate to some implementation
//
// here, just as an illustrating example: do a select
df.select( df(param) )
}
}
}
import ExtraDataFrameOperations._
val df = ...
val otherDF = df.customMethod("hello")
In case you want to prevent the additional import
, turn the object
ExtraDataFrameOperations
into an package object
and store it in in a file called package.scala
within your package.
[1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766
There is a slightly simpler approach: just declare MyClass
as implicit
implicit class MyClass(df: DataFrame) { def myMethod = ... }
This automatically creates the implicit conversion method (also called MyClass
). You can also make it a value class by adding extends AnyVal
which avoids some overhead by not actually creating a MyClass
instance at runtime, but this is very unlikely to matter in practice.
Finally, putting MyClass
into a package object
will allow you to use the new methods anywhere in this package without requiring import of MyClass
, which may be a benefit or a drawback for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With