Is there any specific reason for collectAsList
method of Spark DataFrame API to return a java.util.List
instead of a Scala List?
I believe its mostly a convenience function for Java users and also makes things much simpler for the Python API. Looking at the git logs (and also the since annotation) it was introduced in the initial merge of the DataFrame API so it wasn't necessarily added in response to a particular need. Sometimes some of the APIs return Java types since they are easier to interface with in Python (through py4j) - but that doesn't appear to be the case here (the python API collects by turning the DF into an RDD and collecting on the RDD).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With