I have an application in SparkSQL which returns large number of rows that are very difficult to fit in memory so I will not be able to use collect function on DataFrame, is there a way using which I can get all this rows as an Iterable instaed of the entire rows as list.
I am executing this SparkSQL application using yarn-client.
To Local Iterator Method Microsoft. Spark. Sql Returns an iterator that contains all of the rows in this DataFrame . The iterator will consume as much memory as the largest partition in this DataFrame . With prefetch it may consume up to the memory of the 2 largest partitions. Returns an iterator that contains all of the rows in this DataFrame .
Iteration over rows using itertuples() In order to iterate over rows, we apply a function itertuples() this function return a tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.
DataFrame is a collection of rows with a schema that is the result of executing a structured query (once it will have been executed). DataFrame uses the immutable, in-memory, resilient, distributed and parallel capabilities of RDD, and applies a structure called schema to the data. In Spark 2.0.0 DataFrame is a mere type alias for Dataset [Row].
Below are some examples to iterate through DataFrame using for each. If you have a small dataset, you can also Convert PySpark DataFrame to Pandas and use pandas to iterate through. Use spark.sql.execution.arrow.enabled config to enable Apache Arrow with Spark.
Generally speaking transferring all the data to the driver looks a pretty bad idea and most of the time there is a better solution out there but if you really want to go with this you can use toLocalIterator
method on a RDD:
val df: org.apache.spark.sql.DataFrame = ???
df.cache // Optional, to avoid repeated computation, see docs for details
val iter: Iterator[org.apache.spark.sql.Row] = df.rdd.toLocalIterator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With