I use Scala implicit classes to extend objects I work with frequently. As an example, I have a method similar to this defined on Spark DataFrame
:
implicit class DataFrameExtensions(df: DataFrame) {
def deduplicate: Boolean =
df.groupBy(df.columns.map(col): _*).count
}
But implicit defs are not invoked if the class already defines the same method. What happens if I later upgrade to a new version of Spark that defines a DataFrame#deduplicate
method? Client code will silently switch to the new implementation, which might cause subtle errors (or obvious ones, which are less problematic).
Using reflection, I can throw a runtime error if DataFrame
already defines deduplicate
before my implicit defines it. Theoretically, then, if my implicit method conflicts with an existing one, I can detect it and rename my implicit version. However, once I upgrade Spark, run the app, and detect the issue, it's too late to use the IDE to rename the old method, since any references to df.deduplicate
now refer to the native Spark version. I would have to revert my Spark version, rename the method through the IDE, and then upgrade again. Not the end of the world, but not a great workflow.
Is there a better way to deal with this scenario? How can I use the "pimp my library" pattern safely?
You could add a test that ensures that certain code snippets do not compile into the test suite of DataFrameExtension
. Maybe something like this:
"(???: DataFrame).deduplicate" shouldNot compile
If it compiles without your implicit conversion, then it means that the method deduplicate
has been introduced by the Spark library. In this case, the test fails, and you know that you have to update your implicits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With