Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

collectAsList in Spark DataFrame

Is there any specific reason for collectAsList method of Spark DataFrame API to return a java.util.List instead of a Scala List?

like image 387
sourabh Avatar asked Oct 19 '22 23:10

sourabh


1 Answers

I believe its mostly a convenience function for Java users and also makes things much simpler for the Python API. Looking at the git logs (and also the since annotation) it was introduced in the initial merge of the DataFrame API so it wasn't necessarily added in response to a particular need. Sometimes some of the APIs return Java types since they are easier to interface with in Python (through py4j) - but that doesn't appear to be the case here (the python API collects by turning the DF into an RDD and collecting on the RDD).

like image 108
Holden Avatar answered Nov 15 '22 06:11

Holden